Week 02: SQL – ANALYZE

Author

Eric Araujo

Published

April 1, 2025

Code
%%capture
%load_ext sql
%sql sqlite:///dbs/w02/analyze-example.db
%config SqlMagic.style = '_DEPRECATED_DEFAULT'

1. Create a Table and Insert Data

Code
%%sql

DROP TABLE IF EXISTS orders;

CREATE TABLE orders (
  order_id INTEGER PRIMARY KEY,
  customer_id INTEGER,
  order_date TEXT,
  total REAL
);

-- Insert 10,000 rows
WITH RECURSIVE cnt(x) AS (
  SELECT 1
  UNION ALL
  SELECT x + 1 FROM cnt WHERE x < 10000
)
INSERT INTO orders (customer_id, order_date, total)
SELECT 
  ABS(RANDOM() % 100), 
  date('2024-01-01', '+' || (RANDOM() % 365) || ' days'), 
  ROUND(RANDOM() % 500, 2)
FROM cnt;
 * sqlite:///dbs/w02/analyze-example.db
Done.
Done.
Done.
[]

This generates 10,000 synthetic rows with:

  • customer_id between 0 and 99
  • order_date in the year 2024
  • total up to 500

2. Create an Index

Code
%%sql
CREATE INDEX idx_orders_order_date ON orders(order_date);
 * sqlite:///dbs/w02/analyze-example.db
Done.
[]

3. Run a Query and Check the Plan

Code
%%sql
EXPLAIN QUERY PLAN
SELECT * FROM orders WHERE order_date > '2024-06-01';
 * sqlite:///dbs/w02/analyze-example.db
Done.
id parent notused detail
3 0 205 SEARCH orders USING INDEX idx_orders_order_date (order_date>?)

Before ANALYZE, SQLite might not use the index if it doesn’t know how selective order_date is.

In our case, the query is optimized and using the index. Let’s see what the output means.

Part Meaning
SEARCH orders SQLite is accessing the orders table using a search operation.
USING INDEX idx_orders_order_date It is using the idx_orders_order_date index to speed up the lookup.
(order_date>?) The filter condition is order_date > ? (where ? is the input value).

This shows that SQLite (1) is not scanning the full table and (2) is using the index you created to efficiently jump to rows matching the condition.

Had the query said something like:

SCAN TABLE orders

…that would mean a full table scan — which is slower, especially on large datasets.

4. Run ANALYZE

This updates the sqlite_stat1 table with stats about the distribution of values in the table/indexes.

Code
%%sql
ANALYZE;
 * sqlite:///dbs/w02/analyze-example.db
Done.
[]
Code
## 5. Run the Same Query Again
Code
%%sql
EXPLAIN QUERY PLAN
SELECT * FROM orders WHERE order_date > '2024-06-01';
 * sqlite:///dbs/w02/analyze-example.db
Done.
id parent notused detail
3 0 140 SEARCH orders USING INDEX idx_orders_order_date (order_date>?)

After ANALYZE, you’ll likely see that the index is now being used, especially if the planner sees it as more efficient.