Week 03: SLO 06 – Optimize SQL Queries for Better Performance

Author

Eric Araujo

🎯 Why It Matters

Well-optimized SQL queries:

Run faster
Reduce memory and CPU usage
Improve scalability for large datasets

Even in SQLite, which is lightweight and file-based, writing efficient queries is essential, especially when working with large tables or complex joins.

🔧 Key Techniques & Examples

1. Use SELECT Only What You Need

Avoid SELECT * — only select necessary columns.

-- Bad
SELECT * FROM orders;

-- Good
SELECT order_id, order_date FROM orders;

🔍 In other DBMSs, SELECT * can even prevent index-only scans (e.g., PostgreSQL). In DBMSs like PostgreSQL, an index-only scan is a powerful optimization. It allows the database to read data directly from the index without having to access the full table (also called the heap). This is much faster, especially on large tables.

But for this to work, two things must be true:

All the required columns must be in the index
There must be no need to access the actual table rows

SQLite does not support index-only scans in the same way PostgreSQL does. So in SQLite, this specific optimization doesn’t apply — but it’s still a good habit to avoid SELECT * for performance and clarity.

2. Filter Early with WHERE

Apply WHERE clauses to reduce the number of rows processed.

-- Bad
SELECT customer_id FROM orders;

-- Good
SELECT customer_id FROM orders WHERE order_date > '2024-01-01';

In SQLite, filtering helps avoid unnecessary disk reads since it uses B-Trees for indexes.

3. Use Indexes Wisely

Indexes speed up WHERE, JOIN, and ORDER BY. But avoid over-indexing!

-- Create index on order_date for filtering
CREATE INDEX idx_orders_order_date ON orders(order_date);

In PostgreSQL, you can use EXPLAIN to see if indexes are used. In SQLite, use EXPLAIN QUERY PLAN.

An index is like a sorted list that a database uses to quickly find data in a table — similar to an index in a book.

Instead of scanning every row (called a full table scan), the database looks in the index to jump directly to the relevant rows, which is much faster.

📌 Key Points about Indexes

Speeds up WHERE, JOIN, and ORDER BY queries.
Uses extra disk space.
Too many indexes = slower inserts/updates (because all indexes need updating).
Best used on columns frequently used in filtering or joining.

4. Avoid Functions on Indexed Columns

Using functions on columns disables index use.

-- Bad (no index use)
SELECT * FROM orders WHERE strftime('%Y', order_date) = '2024';

-- Good
SELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';

5. Use EXISTS Instead of IN for Subqueries

-- Slower for large subqueries
SELECT name FROM customers 
WHERE customer_id IN (SELECT customer_id FROM orders);

-- Faster with EXISTS
SELECT name FROM customers c
WHERE EXISTS (
  SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);

This query retrieves the names of customers who have at least one order.

In SQLite, both are often converted internally, but EXISTS is still preferable with correlated subqueries.

The subquery

SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id

says: “Is there at least one order where orders.customer_id = this customer’s ID?”. If yes, the customer is included in the result.

Important

SELECT 1 just returns a dummy value — it could be SELECT * or SELECT 'x'; it doesn’t matter because EXISTS only cares about the existence of at least one matching row.

This is often faster than:

SELECT name FROM customers
WHERE customer_id IN (SELECT customer_id FROM orders);

Because EXISTS can stop searching early (as soon as it finds a match), while IN may need to collect all values first, depending on the DBMS.

6. Use JOINs Instead of Subqueries (when appropriate)

-- Subquery
SELECT name FROM customers 
WHERE customer_id IN (
  SELECT customer_id FROM orders WHERE total > 100
);

-- JOIN (may perform better)
SELECT DISTINCT c.name 
FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
WHERE o.total > 100;

In MySQL and PostgreSQL, query planners can optimize JOINs more effectively.

7. Limit Result Set with LIMIT

Always add a LIMIT for preview or pagination.

SELECT * FROM logs ORDER BY created_at DESC LIMIT 100;

🔍 SQLite-Specific Tips

A. Use ANALYZE to collect statistics and help the query planner:

ANALYZE;

ANALYZE is a command that collects statistics about the contents of tables and indexes. These statistics help SQLite’s query planner make better decisions when optimizing queries.

You can also analyze a specific table or index:

ANALYZE orders;

📈 What It Does

Stores the data distribution and row counts in a system table called sqlite_stat1.
Helps SQLite decide:
- Which index to use
- Whether to use an index at all
- The best join order

📌 When to Use

After creating new indexes
After loading a large amount of data
After major updates or deletes

It does not run automatically — you should run it manually when needed.

B. Use EXPLAIN QUERY PLAN to understand what SQLite is doing:

EXPLAIN QUERY PLAN SELECT * FROM orders WHERE order_date > '2024-01-01';

EXPLAIN is a command that shows how SQLite will execute a SQL statement, step by step. It’s used to understand and debug query performance.

🔍 Two Versions

1. EXPLAIN

Shows low-level virtual machine instructions (for advanced debugging).

EXPLAIN SELECT * FROM orders;

Output: Virtual opcodes (not beginner-friendly)
Use if you’re deep into SQLite internals

2. EXPLAIN QUERY PLAN

The one you’ll use most. It gives a high-level overview of the query plan.

EXPLAIN QUERY PLAN 
SELECT * FROM orders WHERE order_date > '2024-01-01';

Tells you:

Which index (if any) is used
Whether it does a full table scan
The join order

📌 When to Use

To check if indexes are being used
To spot slow queries doing full scans
Before/after ANALYZE to see improvements

Here is an example of the output you might see.

🆚 Comparisons to Other DBMSs

Optimization Tip	SQLite	PostgreSQL	MySQL
Index Support	B-tree only	B-tree, GiST, GIN, BRIN	B-tree, Full-text
Query Plan Tool	`EXPLAIN QUERY PLAN`	`EXPLAIN (ANALYZE, BUFFERS)`	`EXPLAIN`
Parallel Query Support	❌ No	✅ Yes	✅ Limited
Optimizer Complexity	Simple	Advanced (cost-based)	Moderate