Week 02: SQL – Intermediate Concepts Summary

SLOs for Week 02

At the end of this unit, students will be able to…

  1. Perform data aggregation using GROUP BY and HAVING.
  2. Write and execute INNER, LEFT-OUTER, RIGHT, and FULL JOINs.
  3. Construct subqueries for complex queries.
  4. Implement SQL functions and expressions.
  5. Use SQL best practices for readable and efficient queries.
%%capture
%load_ext sql
%sql sqlite:///dbs/w02/library.db
%config SqlMagic.style = '_DEPRECATED_DEFAULT'

🧮 1. SQL Aggregation & Grouping (with SQLite Notes)

SQL lets you summarize data using aggregate functions, and group it with GROUP BY. This is especially helpful for statistics, reports, and dashboards.

🎯 Common Aggregate Functions

Function Description Example
COUNT() Number of rows COUNT(*)
SUM() Total of values SUM(amount)
AVG() Mean average AVG(score)
MIN() Smallest value MIN(age)
MAX() Largest value MAX(salary)
Note

SQLite supports all these functions.

🧩 Basic Grouping Example

Q: How many students are in each major?

SELECT major, COUNT(*) AS student_count
FROM students
GROUP BY major;
Note

✅ SQLite allows this even if you’re selecting columns not in the GROUP BY clause or not inside an aggregate function. It will return one arbitrary value from each group for such columns.

Example (SQLite accepts this):

SELECT major, name
FROM students
GROUP BY major;
Note

This works in SQLite — it will return one name for each major, but which name is returned is undefined. ❌ PostgreSQL (and other standards-compliant SQL engines) will reject this query with an error unless name is also in the GROUP BY or wrapped in an aggregate like MIN(name).

🔍 Filtering Groups with HAVING

Use HAVING to filter after grouping.

Q: Show only majors with more than 10 students:

SELECT major, COUNT(*) AS student_count
FROM students
GROUP BY major
HAVING COUNT(*) > 10;
Note

ℹ️ In SQLite, you can refer to the alias (student_count) in HAVING. In PostgreSQL, you’d need to repeat the expression: HAVING COUNT(*) > 10.

📊 Combining Aggregates

Q: Show average GPA per department — only if it’s above 3.5:

SELECT department, AVG(gpa) AS avg_gpa
FROM students
GROUP BY department
HAVING AVG(gpa) > 3.5;

🔗 Grouping by Expressions

You can group by computed values like substrings or date parts.

Q: Count students by admission year (from text-based dates):

SELECT SUBSTR(admission_date, 1, 4) AS year, COUNT(*) AS num_students
FROM students
GROUP BY year;

The SUBSTR() function extracts a substring from a given string.

SUBSTR(string, start, length)
  • string: the text to extract from
  • start: the position to start (1-based index)
  • length (optional): how many characters to return
-- Get the first 4 characters of a date
SELECT SUBSTR('2024-09-15', 1, 4);  -- Returns '2024'

-- Get the first letter of a name
SELECT SUBSTR(name, 1, 1) FROM students;

📌 If length is omitted, SQLite returns the rest of the string from start.

Note

✅ Works well in SQLite (which stores dates as text: YYYY-MM-DD). ❗ In PostgreSQL, use: EXTRACT(YEAR FROM admission_date).

🧠 Tips for Grouping Queries

  • Use GROUP BY with aggregate functions.
  • Use HAVING to filter groups (not rows).
  • Use WHERE for filtering before grouping.
  • Prefer column names or expressions in GROUP BY, not just positions (e.g., avoid GROUP BY 1).
  • Be cautious using non-aggregated, non-grouped columns in SQLite — it’s allowed, but not portable or predictable.