%%capture
%load_ext sql
%sql sqlite:///dbs/w02/library.db
%config SqlMagic.style = '_DEPRECATED_DEFAULT'
Week 02: SQL – Intermediate Concepts Summary
SLOs for Week 02
At the end of this unit, students will be able to…
- Perform data aggregation using GROUP BY and HAVING.
- Write and execute INNER, LEFT-OUTER, RIGHT, and FULL JOINs.
- Construct subqueries for complex queries.
- Implement SQL functions and expressions.
- Use SQL best practices for readable and efficient queries.
🧮 1. SQL Aggregation & Grouping (with SQLite Notes)
SQL lets you summarize data using aggregate functions, and group it with GROUP BY. This is especially helpful for statistics, reports, and dashboards.
🎯 Common Aggregate Functions
Function | Description | Example |
---|---|---|
COUNT() |
Number of rows | COUNT(*) |
SUM() |
Total of values | SUM(amount) |
AVG() |
Mean average | AVG(score) |
MIN() |
Smallest value | MIN(age) |
MAX() |
Largest value | MAX(salary) |
✅ SQLite supports all these functions.
🧩 Basic Grouping Example
Q: How many students are in each major?
SELECT major, COUNT(*) AS student_count
FROM students
GROUP BY major;
✅ SQLite allows this even if you’re selecting columns not in the GROUP BY clause or not inside an aggregate function. It will return one arbitrary value from each group for such columns.
Example (SQLite accepts this):
SELECT major, name
FROM students
GROUP BY major;
This works in SQLite — it will return one name for each major, but which name is returned is undefined. ❌ PostgreSQL (and other standards-compliant SQL engines) will reject this query with an error unless name is also in the GROUP BY or wrapped in an aggregate like MIN(name).
🔍 Filtering Groups with HAVING
Use HAVING to filter after grouping.
Q: Show only majors with more than 10 students:
SELECT major, COUNT(*) AS student_count
FROM students
GROUP BY major
HAVING COUNT(*) > 10;
ℹ️ In SQLite, you can refer to the alias (student_count) in HAVING. In PostgreSQL, you’d need to repeat the expression: HAVING COUNT(*) > 10.
📊 Combining Aggregates
Q: Show average GPA per department — only if it’s above 3.5:
SELECT department, AVG(gpa) AS avg_gpa
FROM students
GROUP BY department
HAVING AVG(gpa) > 3.5;
🔗 Grouping by Expressions
You can group by computed values like substrings or date parts.
Q: Count students by admission year (from text-based dates):
SELECT SUBSTR(admission_date, 1, 4) AS year, COUNT(*) AS num_students
FROM students
GROUP BY year;
The SUBSTR() function extracts a substring from a given string.
SUBSTR(string, start, length)
- string: the text to extract from
- start: the position to start (1-based index)
- length (optional): how many characters to return
-- Get the first 4 characters of a date
SELECT SUBSTR('2024-09-15', 1, 4); -- Returns '2024'
-- Get the first letter of a name
SELECT SUBSTR(name, 1, 1) FROM students;
📌 If length is omitted, SQLite returns the rest of the string from start.
✅ Works well in SQLite (which stores dates as text: YYYY-MM-DD). ❗ In PostgreSQL, use: EXTRACT(YEAR FROM admission_date).
🧠 Tips for Grouping Queries
- Use GROUP BY with aggregate functions.
- Use HAVING to filter groups (not rows).
- Use WHERE for filtering before grouping.
- Prefer column names or expressions in GROUP BY, not just positions (e.g., avoid GROUP BY 1).
- Be cautious using non-aggregated, non-grouped columns in SQLite — it’s allowed, but not portable or predictable.