system-design.io

Query Optimization and Execution Plans

The GPS Analogy: A Bridge to Understanding

Imagine you are driving from Milwaukee to Chicago for a job interview. You open Google Maps and type in the destination. In milliseconds, the app doesn’t just find a route — it evaluates dozens of them. It weighs the interstate against city roads, checks for construction, estimates fuel cost, factors in the time of day, and then serves you the optimal path — not just any path.

Now imagine a second version of Maps that ignores all traffic data. It picks the route based purely on a fixed rulebook: “Always take the road with the most lanes.” Sometimes that works fine. Sometimes you sit in traffic for two hours.

This is exactly the difference between a cost-based query optimizer and a rule-based query optimizer in a relational database. And the “turn-by-turn directions” the GPS ultimately generates? That’s your execution plan.

Every time you fire a SQL query at a database like PostgreSQL, MySQL, or SQL Server, there’s a small but brilliant engine working behind the scenes — the Query Optimizer — doing exactly what Google Maps does: evaluating multiple strategies and selecting the fastest, cheapest path to your data.

By the end of this article, you will not only understand how that engine works, but you’ll be able to read its decisions, challenge them, and influence them to write dramatically faster queries.

Chapter 1: What Actually Happens When You Run a Query?

Before we talk about optimization, we need to understand the full journey a SQL query takes from the moment you press “Run” to the moment rows appear on your screen.

Think of this journey like a manuscript going through a publishing house. The raw text (your SQL) passes through several editorial departments — each one transforming it — before it becomes a printed book (your results).

  ┌──────────────────────────────────────────────────────────────────────┐
  │                     THE SQL QUERY PIPELINE                           │
  └──────────────────────────────────────────────────────────────────────┘

  ┌─────────┐    ┌─────────────┐    ┌───────────────┐    ┌──────────────┐
  │  Raw    │───▶│   Parser    │───▶│  Query Rewriter│───▶│  Optimizer   │
  │  SQL    │    │  (Syntax    │    │  (Logical      │    │  (Plan       │
  │  Text   │    │  Check)     │    │   Transform)   │    │  Generator)  │
  └─────────┘    └─────────────┘    └───────────────┘    └──────┬───────┘
                                                                 │
                                                                 ▼
                                                         ┌──────────────┐
                                                         │   Executor   │
                                                         │  (Physical   │
                                                         │  Operations) │
                                                         └──────┬───────┘
                                                                │
                                                                ▼
                                                         ┌──────────────┐
                                                         │   Results    │
                                                         └──────────────┘

In Figure 1, notice the four distinct stages a query passes through. The optimizer sits squarely in the middle — it receives a logically valid query tree and must decide how to physically execute it.

Let’s walk through each stage:

Stage 1: The Parser

The parser reads your raw SQL string and checks it for syntactic correctness. It’s analogous to a spell-checker — it doesn’t care what you mean, only whether you’ve used the language correctly. The output is a parse tree, a structured, in-memory representation of your query’s clauses.

-- If you write this:
SELECT name FROM users WEHRE id = 1;
--                     ^^^^^ Typo! Parser throws an error immediately.

The parser never touches the disk. It works entirely in memory against the text itself.

Stage 2: The Query Rewriter

The rewriter takes the parse tree and applies logical transformations before any planning happens. This is where views get expanded into their underlying SELECT statements, where IN subqueries sometimes get rewritten as EXISTS, and where permission checks happen.

Think of this stage as a translator who converts shorthand into full prose before it goes to the editor.

Stage 3: The Optimizer (The Star of Our Show)

This is where the magic happens. The optimizer takes the logically rewritten query tree and asks: “What is the cheapest physical way to execute this?”

It generates multiple candidate execution plans and estimates the cost of each one — in terms of CPU cycles, I/O operations, and memory usage — then selects the plan with the lowest estimated cost.

Stage 4: The Executor

The executor is the worker on the floor. It takes the chosen plan and carries it out, step by step, retrieving pages from disk, joining rows, sorting, and aggregating. The executor doesn’t think — it just follows instructions.

Chapter 2: The Query Optimizer — Two Schools of Thought

The optimizer’s core job is simple to state and complex to execute: find the best execution plan without spending more time searching for it than you’d save by running it.

Historically, databases have approached this problem in two fundamentally different ways.

Rule-Based Optimization (RBO): The Checklist Approach

The earliest query optimizers followed a rigid playbook. Given a query, the optimizer would run through a checklist of fixed rules — like a recipe — and produce a plan by applying them in order, without ever looking at how the data was actually distributed.

Common rules included:

“If an index exists on the WHERE clause column, use it.”
“Always filter rows before joining tables.”
“If a unique index is available, prefer it over a non-unique one.”

This approach is predictable and fast to plan — the optimizer doesn’t need to run statistics queries. Oracle’s original optimizer (pre-Oracle 7) used this approach exclusively.

The fatal flaw: Rules are blind to data. Imagine a table orders where 99% of all rows have status = 'completed'. A rule-based optimizer might happily use an index on status because “use indexes when available.” In reality, a full table scan would be dramatically faster since nearly every row matches anyway. The RBO doesn’t know what it doesn’t look at.

Cost-Based Optimization (CBO): The GPS Approach

Modern databases — PostgreSQL, MySQL 8+, SQL Server, Oracle 10g+ — use cost-based optimization. Instead of following rules blindly, the CBO:

Generates multiple candidate plans (sometimes thousands for complex queries)
Estimates the cardinality (expected row count) for each operation using statistical metadata
Calculates a cost score for each plan using a cost model

Selects the lowest-cost plan

  ┌────────────────────────────────────────────────────────────────────┐
  │              COST-BASED OPTIMIZER DECISION PROCESS                 │
  └────────────────────────────────────────────────────────────────────┘

    Query Tree
        │
        ▼
  ┌─────────────────┐
  │  Plan Generator │ ──── generates ────▶  Plan A  (cost: 450)
  │                 │                       Plan B  (cost: 120) ◀── WINNER
  │                 │                       Plan C  (cost: 780)
  └────────┬────────┘
        │ uses
        ▼
  ┌────────────────────────────────────┐
  │         Statistics Catalog         │
  │  ┌────────────┐  ┌──────────────┐  │
  │  │ Row Counts │  │ Histograms   │  │
  │  │ per table  │  │ per column   │  │
  │  └────────────┘  └──────────────┘  │
  │  ┌────────────┐  ┌──────────────┐  │
  │  │ Index Info │  │ Cardinality  │  │
  │  │            │  │ Estimates    │  │
  │  └────────────┘  └──────────────┘  │
  └────────────────────────────────────┘

In Figure 2, the optimizer is only as good as its statistics. When statistics are stale (not recently updated), the cost estimates can be wildly wrong — leading the optimizer to choose a bad plan with complete confidence.

The cost model typically boils down to a formula roughly like:

\[\text{Plan Cost} = (n_{\text{pages}} \times C_{\text{io}}) + (n_{\text{tuples}} \times C_{\text{cpu}})\]

Where $C_{\text{io}}$ is the estimated cost per disk page read and $C_{\text{cpu}}$ is the cost per tuple processed. The exact values are tunable constants in most databases.

Chapter 3: Execution Plans — Reading the Map

An execution plan is the optimizer’s final decision, materialized as a tree of physical operations. Each node in the tree is called a plan node or operator, and each one represents one discrete physical action: scan a table, build a hash table, sort rows, etc.

Getting the Plan: EXPLAIN and EXPLAIN ANALYZE

In PostgreSQL (our reference database), you can reveal a query’s execution plan with EXPLAIN:

import psycopg2

conn = psycopg2.connect("dbname=mydb user=postgres")
cur = conn.cursor()

# Get the estimated plan (does NOT run the query)
cur.execute("EXPLAIN SELECT * FROM orders WHERE customer_id = 42")
plan = cur.fetchall()
for row in plan:
    print(row)

# Get the actual plan (RUNS the query, shows real stats)
cur.execute("EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42")
actual_plan = cur.fetchall()
for row in actual_plan:
    print(row)

The difference is crucial: [web:7]

EXPLAIN — Shows the estimated plan without running the query. Safe for production.
EXPLAIN ANALYZE — Actually executes the query and shows real row counts and timings alongside estimates. Use with caution on mutation queries (UPDATE, DELETE) in production — wrap in a BEGIN/ROLLBACK if needed.

Anatomy of a Plan Output

Here’s a real example plan for a simple join query:

EXPLAIN ANALYZE
SELECT c.name, COUNT(o.id)
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.order_date > '2024-01-01'
GROUP BY c.name
ORDER BY COUNT(o.id) DESC;

Sort  (cost=1200.45..1202.95 rows=1000 width=40)
      (actual time=45.2..45.8 rows=982 loops=1)
  Sort Key: (count(o.id)) DESC
  ->  HashAggregate  (cost=1145.00..1155.00 rows=1000 width=40)
        (actual time=44.1..44.9 rows=982 loops=1)
        Group Key: c.name
        ->  Hash Join  (cost=450.00..1100.00 rows=9000 width=32)
              (actual time=12.3..40.2 rows=9144 loops=1)
              Hash Cond: (o.customer_id = c.id)
              ->  Seq Scan on orders  (cost=0.00..520.00 rows=9000 width=16)
                    (actual time=0.05..15.3 rows=9144 loops=1)
                    Filter: (order_date > '2024-01-01')
              ->  Hash  (cost=250.00..250.00 rows=16000 width=24)
                    (actual time=11.8..11.8 rows=16000 loops=1)
                    ->  Seq Scan on customers  (cost=0.00..250.00 rows=16000 width=24)
                          (actual time=0.02..6.4 rows=16000 loops=1)

Now let’s decode this. The plan reads bottom-up, inside-out — just like evaluating a nested function call. The deepest indented nodes execute first.

  ┌──────────────────────────────────────────────────────────────────────┐
  │               EXECUTION PLAN TREE (reads bottom-up)                  │
  └──────────────────────────────────────────────────────────────────────┘

                       ┌──────────┐
                       │   Sort   │  ← Step 5: Sort final results
                       └────┬─────┘
                            │
                    ┌───────▼────────┐
                    │ HashAggregate  │  ← Step 4: GROUP BY + COUNT()
                    └───────┬────────┘
                            │
                   ┌────────▼─────────┐
                   │    Hash Join     │  ← Step 3: Join customers + orders
                   └──┬───────────┬───┘
                      │           │
              ┌───────▼──┐   ┌────▼──────┐
              │ Seq Scan │   │   Hash    │  ← Step 2: Build hash table
              │  orders  │   └────┬──────┘
              │(filtered)│        │
              └──────────┘   ┌────▼──────┐
                             │ Seq Scan  │  ← Step 1: Scan all customers
                             │ customers │
                             └───────────┘

In Figure 3, the execution flows upward. Notice how we scan both tables first (Steps 1 and 2), then join them (Step 3), then aggregate (Step 4), then sort (Step 5). Each arrow represents rows flowing upward from child to parent.

Each plan node shows two key numbers in the format (cost=start..total rows=N width=W):

Field	Meaning
`cost=0.00..520.00`	Startup cost .. Total cost (in arbitrary units)
`rows=9000`	Estimated number of rows output
`width=16`	Estimated average row size in bytes
`actual time=0.05..15.3`	Real start time .. Real end time (ms) — only with ANALYZE
`loops=1`	How many times this node was executed

The most important flag to watch: when estimated rows diverge wildly from actual rows, your statistics are stale and the optimizer made decisions based on wrong assumptions.

Chapter 4: The Core Operators — A Field Guide

Understanding an execution plan means recognizing its building blocks. Let’s categorize the major operators by their behavioral intent.

Category 1: Table Access Methods

These are how the database reads raw data from storage. Think of these as your starting ingredients.

Sequential Scan (Seq Scan)

The database reads every single page of the table, in order, from start to finish. This is the “brute force” approach.

  Table Pages on Disk:
  ┌────┬────┬────┬────┬────┬────┬────┬────┐
  │ P1 │ P2 │ P3 │ P4 │ P5 │ P6 │ P7 │ P8 │
  └────┴────┴────┴────┴────┴────┴────┴────┘
    ──────────────────────────────────────▶
                  Reads ALL pages

  Cost: O(N) where N = number of pages

When is a Seq Scan good? When you’re fetching a large fraction (typically > 5-10%) of the table. For small tables, it’s almost always the right choice since index lookups have overhead.

Index Scan

The database traverses a B-tree index to find specific row pointers, then fetches those rows from the heap (main table storage).

  B-Tree Index                    Heap (Table)
  ┌─────────────┐                ┌────────────────┐
  │   Root Node │                │ Page 1         │
  │   [50|100]  │                │ ... row id=42  │◀──┐
  └──┬──────┬───┘                │ ...            │   │
     │      │                    └────────────────┘   │
  ┌──▼─┐  ┌─▼──┐    pointer      ┌────────────────┐   │
  │Leaf│  │Leaf│ ───────────────▶│ Page 7         │   │
  │ 42 │  │ 98 │                 │ ... row id=98  │◀──┘
  └────┘  └────┘                 └────────────────┘

  Cost: O(log N) for index + O(K) for heap fetches
  where K = number of matching rows

In Figure 5, each leaf of the B-tree points to a specific row location on disk. For small result sets, this is dramatically faster than scanning every page.

Index Only Scan

The best case: all the columns your query needs exist inside the index itself. The database never touches the heap at all.

# This query can be satisfied by an index-only scan
# IF an index exists on (customer_id, order_date)
cur.execute("""
    SELECT customer_id, order_date
    FROM orders
    WHERE customer_id = 42
""")
# The index already has both columns — no heap visit needed!

Bitmap Index Scan + Bitmap Heap Scan

This is the hybrid approach — a middle ground between Index Scan and Seq Scan. PostgreSQL first scans the index to build a bitmap (a memory structure marking which heap pages contain matching rows), then reads only those heap pages in physical order. This avoids random I/O.

  Phase 1: Scan Index, Build Bitmap
  ┌──────────────────────────────────┐
  │ Bitmap (in memory)               │
  │ Page 1: ✓  Page 2: ✗  Page 3: ✓ │
  │ Page 4: ✓  Page 5: ✗  Page 6: ✗ │
  └──────────────────────────────────┘

  Phase 2: Read only marked pages in order
  ┌────┐      ┌────┐      ┌────┐
  │ P1 │─────▶│ P3 │─────▶│ P4 │   (sequential access pattern!)
  └────┘      └────┘      └────┘

Category 2: Join Algorithms

Joins are the most performance-sensitive part of most queries. The optimizer must choose between three physical join strategies.

Nested Loop Join

The simplest conceptually: for each row in the “outer” table, scan the “inner” table looking for matches.

# Conceptual Python equivalent:
results = []
for outer_row in table_A:          # outer loop
    for inner_row in table_B:      # inner loop
        if outer_row.id == inner_row.a_id:
            results.append((outer_row, inner_row))

  Outer Table (small)    Inner Table (large, with index)
  ┌────────────────┐     ┌────────────────────────────┐
  │ Row 1 ─────────┼────▶│ Index Lookup → matching row │
  │ Row 2 ─────────┼────▶│ Index Lookup → matching row │
  │ Row 3 ─────────┼────▶│ Index Lookup → matching row │
  └────────────────┘     └────────────────────────────┘

  Best when: outer table is small AND inner has an index
  Cost: O(M × log N)  where M = outer rows, N = inner rows

Hash Join

Build a hash table from the smaller table (“build side”), then probe it with every row from the larger table (“probe side”).

# Conceptual Python equivalent:
hash_table = {}
for row in smaller_table:                    # Build phase
    key = row['id']
    hash_table[key] = row

results = []
for row in larger_table:                     # Probe phase
    key = row['foreign_key']
    if key in hash_table:
        results.append((row, hash_table[key]))

This is the workhorse for large, un-indexed joins. The initial build phase has cost, but probing is O(1) per row. [web:7]

Merge Join

Requires both inputs to be sorted on the join key. It then performs a single linear pass through both sorted lists simultaneously — like merging two sorted card decks.

  Table A (sorted by id):    Table B (sorted by a_id):
  ┌───┐                      ┌───┐
  │ 1 │──────────────────────│ 1 │  ← Match! Emit row.
  │ 2 │──────────────────────│ 2 │  ← Match! Emit row.
  │ 3 │     ┐                │ 4 │  ← A=3 < B=4, advance A.
  │ 5 │     └───────────────▶│ 4 │  ← A=5 > B=4, advance B.
  │ 5 │──────────────────────│ 5 │  ← Match! Emit row.
  └───┘                      └───┘

  Best when: both inputs are already sorted (e.g., from index scans)
  Cost: O(M log M + N log N) if sorting needed, O(M + N) if pre-sorted

Join Strategy Selection Summary

Strategy	Best For	Requires	Memory
Nested Loop	Small outer, indexed inner	Index on join key	Low
Hash Join	Large, unsorted tables	Nothing	Medium-High
Merge Join	Pre-sorted or already indexed	Sort order	Low

Category 3: Aggregation and Sorting Operators

HashAggregate

Groups rows using an in-memory hash table. Fast when the number of distinct groups fits in memory (work_mem in PostgreSQL).

GroupAggregate

Operates on pre-sorted input. If rows are already sorted by the GROUP BY key (from an index or a prior Sort node), GroupAggregate processes them in a single linear pass without building a hash table.

Sort

Sorts rows by one or more keys. If the data fits in work_mem, it’s an in-memory quicksort. If not, it spills to disk — a condition you’ll see flagged as “Batches: N” in the plan output when N > 1.

Chapter 5: Statistics — The Optimizer’s Crystal Ball

Here’s the central insight that every senior engineer understands: the optimizer is only as smart as its statistics. Bad statistics → bad cost estimates → bad plan choice → slow queries. The optimizer cannot improve its reasoning beyond the quality of data it receives.

What Statistics Are Collected

PostgreSQL’s pg_statistic catalog stores, for every column:

n_distinct — Estimated number of distinct values (cardinality)
null_frac — Fraction of null values
most_common_vals — The N most frequently occurring values
most_common_freqs — The frequency of each common value
histogram_bounds — Bucket boundaries for non-common values (a histogram)

  Column: order_status
  ┌─────────────────────────────────────────────────────┐
  │  Most Common Values:                                 │
  │  'completed' → 72%                                   │
  │  'pending'   → 18%                                   │
  │  'cancelled' → 8%                                    │
  │  'refunded'  → 2%                                    │
  └─────────────────────────────────────────────────────┘

  The optimizer uses this to estimate:
  WHERE status = 'completed'  → ~72% of rows (Seq Scan likely)
  WHERE status = 'refunded'   → ~2% of rows  (Index Scan likely)

In Figure 9, notice that the same column with different filter values can lead to completely different plan choices. This is the power of histogram-based statistics.

Updating Statistics

In PostgreSQL, ANALYZE collects fresh statistics:

import psycopg2

conn = psycopg2.connect("dbname=mydb user=postgres")
conn.autocommit = True
cur = conn.cursor()

# Analyze a specific table
cur.execute("ANALYZE orders;")

# Analyze entire database
cur.execute("ANALYZE;")

PostgreSQL also runs autovacuum (which includes auto-analyze) in the background, but on heavily updated tables, manual ANALYZE after bulk operations is a best practice.

The Cardinality Estimation Problem

Cardinality estimation — predicting how many rows an operation will return — is one of the hardest problems in database engineering. When you have a multi-column filter like WHERE status = 'completed' AND region = 'West', the optimizer typically assumes statistical independence between columns:

\[\text{selectivity}_{\text{combined}} = \text{selectivity}_{\text{status}} \times \text{selectivity}_{\text{region}}\]

If status and region are correlated (e.g., the West region disproportionately has ‘pending’ orders), this assumption breaks down. PostgreSQL 10+ introduced extended statistics to address this:

# Create extended stats for correlated columns
cur.execute("""
    CREATE STATISTICS order_stats (dependencies)
    ON status, region
    FROM orders;
""")
cur.execute("ANALYZE orders;")

Chapter 6: Common Optimization Techniques

Armed with our understanding of how plans work, let’s look at the primary levers we can pull to improve query performance.

Technique 1: Strategic Indexing

Indexes are the single highest-leverage optimization tool. But indexes are not free — each index adds write overhead and storage cost. The goal is targeted, surgical indexing.

Composite Indexes and Column Order

For a composite index, column order matters enormously. The rule is: equality predicates first, then range predicates.

# Query pattern we want to optimize:
# WHERE status = 'completed' AND order_date > '2024-01-01'

# GOOD: equality column first
cur.execute("""
    CREATE INDEX idx_orders_status_date
    ON orders (status, order_date);
""")

# BAD: range column first — the index is far less useful
# because the range on order_date prevents efficient use
# of the status part of the index for this specific query
cur.execute("""
    CREATE INDEX idx_orders_date_status
    ON orders (order_date, status);
""")

Partial Indexes

When you frequently query a subset of data, a partial index covers only that subset — smaller, faster, and more maintainable:

# If 90% of queries only look at 'pending' orders:
cur.execute("""
    CREATE INDEX idx_pending_orders
    ON orders (created_at)
    WHERE status = 'pending';
""")
# This index is tiny compared to a full-table index
# but serves the most common query pattern perfectly.

Covering Indexes

Add frequently-accessed non-filter columns to enable Index Only Scans:

# Query: SELECT id, name FROM customers WHERE email = 'x@y.com'
# Regular index: must visit heap for 'name'
# Covering index: heap visit eliminated

cur.execute("""
    CREATE INDEX idx_customers_email_covering
    ON customers (email)
    INCLUDE (name, id);
""")

Technique 2: Rewriting Queries for Optimizer Friendliness

Sometimes the way we write SQL inadvertently confuses the optimizer or prevents index usage.

The OR trap:

# BAD: OR on different columns often prevents index usage
cur.execute("""
    SELECT * FROM orders
    WHERE customer_id = 42 OR support_id = 42;
""")

# GOOD: UNION ALL — each branch can use its own index
cur.execute("""
    SELECT * FROM orders WHERE customer_id = 42
    UNION ALL
    SELECT * FROM orders WHERE support_id = 42;
""")

Functions on indexed columns:

# BAD: Function on the column defeats the index!
# The optimizer cannot use an index on order_date
# when you wrap it in a function.
cur.execute("""
    SELECT * FROM orders
    WHERE EXTRACT(YEAR FROM order_date) = 2024;
""")

# GOOD: Rewrite as a range query — index-friendly
cur.execute("""
    SELECT * FROM orders
    WHERE order_date >= '2024-01-01'
      AND order_date < '2025-01-01';
""")

Avoiding SELECT *:

# BAD: SELECT * forces the executor to read all columns
# and prevents Index Only Scans
cur.execute("SELECT * FROM customers WHERE email = 'x@y.com'")

# GOOD: Project only what you need
cur.execute("SELECT id, name FROM customers WHERE email = 'x@y.com'")

Technique 3: Forcing Plan Choices (When the Optimizer Is Wrong)

Sometimes, especially when statistics are hard to keep current, the optimizer chooses a suboptimal plan. Most databases offer escape hatches.

In PostgreSQL, you can disable specific join types or scan methods at the session level:

# Temporarily disable hash joins to force merge or nested loop
cur.execute("SET enable_hashjoin = OFF;")
cur.execute("SET enable_seqscan = OFF;")

# Run your query
cur.execute("EXPLAIN ANALYZE SELECT ...")

# Always re-enable! This is session-scoped but be careful.
cur.execute("SET enable_hashjoin = ON;")
cur.execute("SET enable_seqscan = ON;")

⚠️ Caution: Disabling planner methods is a diagnostic tool, not a permanent fix. If you need the optimizer to consistently make a different choice, the right solution is to improve statistics, add an index, or rewrite the query.

Technique 4: Managing work_mem

The work_mem setting controls how much memory each sort or hash operation can use before spilling to disk. Spill-to-disk events are catastrophic for performance — often 10-100x slower than in-memory operations.

# Check if a sort or hash operation spilled to disk
# Look for "Batches: N" where N > 1 in EXPLAIN ANALYZE output

# Increase work_mem for a specific heavy query
cur.execute("SET work_mem = '256MB';")
cur.execute("EXPLAIN ANALYZE SELECT ... ORDER BY ...")
cur.execute("SET work_mem = '4MB';")  # Reset to default

# Note: work_mem applies PER SORT OPERATION, not per query
# A query with 3 sorts uses up to 3 × work_mem simultaneously

Chapter 7: Advanced Concepts — The N+1 Problem and CTEs

The N+1 Query Problem

One of the most common performance anti-patterns in application code is the N+1 query problem. It doesn’t appear in a single execution plan — because it’s the result of many separate queries, each individually fast, combined catastrophically.

The Pattern:

# Naive ORM-style code — produces N+1 queries
cur.execute("SELECT id FROM customers WHERE active = true")
customers = cur.fetchall()                   # Query 1

orders_by_customer = {}
for customer in customers:                   # N more queries!
    cur.execute(
        "SELECT * FROM orders WHERE customer_id = %s",
        (customer,)
    )
    orders_by_customer[customer] = cur.fetchall()

If there are 1,000 active customers, this fires 1,001 queries. The solution is a single JOIN:

# GOOD: Single query with a JOIN
cur.execute("""
    SELECT c.id, c.name, o.id as order_id, o.total
    FROM customers c
    LEFT JOIN orders o ON o.customer_id = c.id
    WHERE c.active = true
""")
results = cur.fetchall()

One execution plan. One round trip. Dramatically faster.

CTEs: Optimization Fence or Optimization Aid?

Common Table Expressions (CTEs, the WITH clause) are powerful organizational tools, but their optimization behavior is often misunderstood.

In PostgreSQL prior to version 12, CTEs were optimization fences — the planner treated each CTE as a materialized subquery, executing it fully before the outer query could push predicates into it:

# In PostgreSQL < 12, this CTE would scan ALL orders
# before filtering — the WHERE in the outer query
# couldn't be "pushed down" into the CTE
cur.execute("""
    WITH recent_orders AS (
        SELECT * FROM orders  -- scanned ALL rows
    )
    SELECT * FROM recent_orders
    WHERE order_date > '2024-01-01';  -- filter too late!
""")

From PostgreSQL 12 onward, non-recursive CTEs that are referenced only once are inlined by default, removing this barrier. But explicitly marking a CTE as MATERIALIZED or NOT MATERIALIZED gives you fine-grained control:

# Force materialization (pre-12 behavior) — useful when
# the CTE result is used multiple times and recomputation is expensive
cur.execute("""
    WITH MATERIALIZED expensive_calc AS (
        SELECT customer_id, SUM(total) as lifetime_value
        FROM orders
        GROUP BY customer_id
    )
    SELECT c.name, ec.lifetime_value
    FROM customers c
    JOIN expensive_calc ec ON ec.customer_id = c.id
    WHERE ec.lifetime_value > 10000;
""")

Chapter 8: Interview Patterns and Practical Diagnostics

You now have the conceptual framework. Let’s map this to what actually appears in technical interviews.

The Canonical Diagnosis Workflow

When asked “How would you investigate a slow query?”, walk through this sequence:

  ┌───────────────────────────────────────────────────────────┐
  │            SLOW QUERY DIAGNOSIS FLOWCHART                 │
  └───────────────────────────────────────────────────────────┘

  ┌─────────────────┐
  │  EXPLAIN ANALYZE │
  │  the query       │
  └────────┬─────────┘
           │
           ▼
  ┌─────────────────────────────────────────────────┐
  │  Are estimated rows ≈ actual rows?               │
  │  NO ─────────────────────────────▶ Run ANALYZE   │
  │  YES                                             │
  └────────┬────────────────────────────────────────┘
           │
           ▼
  ┌──────────────────────────────────────────────────┐
  │  Are there Seq Scans on large tables?             │
  │  YES ─────────────────────────────▶ Add Index    │
  │  NO                                              │
  └────────┬─────────────────────────────────────────┘
           │
           ▼
  ┌──────────────────────────────────────────────────┐
  │  Are there Sort nodes with Batches > 1?           │
  │  YES ─────────────────────────────▶ Increase     │
  │  NO                                 work_mem     │
  └────────┬─────────────────────────────────────────┘
           │
           ▼
  ┌──────────────────────────────────────────────────┐
  │  Any Hash Join on huge tables?                    │
  │  Consider composite indexes for Merge/NL Join    │
  └──────────────────────────────────────────────────┘

Red Flags in an Execution Plan

Memorize these as interview talking points:

Seq Scan on a large table with a WHERE clause → Missing index
Estimated rows = 1, Actual rows = 50,000 → Stale statistics; run ANALYZE
Nested Loop with huge outer table → Should likely be Hash Join; check work_mem or join cardinality
Sort with Batches: 4 → Sorting spilled to disk; increase work_mem
“rows removed by filter: N” in Index Scan → Index is partially useful but not selective enough; consider a partial or composite index
Function on indexed column in filter → Function defeating index; rewrite predicate

Classic Interview Question: Why Is This Query Slow?

# Interviewer gives you this query and says it's slow:
cur.execute("""
    SELECT u.name, COUNT(p.id) as post_count
    FROM users u
    LEFT JOIN posts p ON p.user_id = u.id
    WHERE LOWER(u.email) = 'alice@example.com'
    GROUP BY u.name;
""")

Your answer: The LOWER() function on u.email defeats any index on the email column. The fix is either:

# Option 1: Create a functional index
cur.execute("""
    CREATE INDEX idx_users_email_lower
    ON users (LOWER(email));
""")

# Option 2: Rewrite the query to avoid the function
cur.execute("""
    SELECT u.name, COUNT(p.id) as post_count
    FROM users u
    LEFT JOIN posts p ON p.user_id = u.id
    WHERE u.email = 'alice@example.com'  -- assume emails stored in lowercase
    GROUP BY u.name;
""")

Chapter 9: Putting It All Together — A Worked Example

Let’s walk through a complete real-world optimization scenario from start to finish.

Setup: An e-commerce platform is experiencing slow report generation. The offending query:

# SLOW: Report of top 100 customers by revenue in the last 90 days
cur.execute("""
    SELECT
        c.id,
        c.name,
        c.email,
        SUM(o.total_amount) AS revenue
    FROM customers c
    JOIN orders o ON o.customer_id = c.id
    WHERE o.created_at >= NOW() - INTERVAL '90 days'
      AND o.status = 'completed'
    GROUP BY c.id, c.name, c.email
    ORDER BY revenue DESC
    LIMIT 100;
""")

Step 1 — Run EXPLAIN ANALYZE:

Sort  (cost=85000..85000 rows=100 width=64)
      (actual time=4821.3..4821.4 rows=100 loops=1)
  ->  HashAggregate  (cost=72000..73000 rows=50000 width=64)
        (actual time=4750.2..4820.1 rows=48392 loops=1)
        ->  Hash Join  (cost=1200..55000 rows=2200000 width=48)
              (actual time=45.2..3100.4 rows=2314892 loops=1)
              Hash Cond: (o.customer_id = c.id)
              ->  Seq Scan on orders  (cost=0..45000 rows=2200000)
                    (actual time=0.1..1200.3 rows=2314892 loops=1)
                    Filter: (created_at >= NOW()-'90 days'
                             AND status='completed')
                    Rows Removed by Filter: 4821033
              ->  Hash on customers  (cost=800..800 rows=32000)
                    (actual time=40.1..40.1 rows=32000 loops=1)

Step 2 — Diagnose:

The Seq Scan on orders reads 7.1 million rows and discards 4.8 million. Massive waste.
There is no index on (created_at, status) or (status, created_at).
Estimated rows (2.2M) ≈ actual rows (2.3M) — statistics are fine.

Step 3 — Fix:

# Create a composite index: equality filter (status) first,
# then range filter (created_at)
cur.execute("""
    CREATE INDEX idx_orders_status_created
    ON orders (status, created_at)
    INCLUDE (customer_id, total_amount);
""")
# The INCLUDE adds our needed columns for a potential index-only scan

cur.execute("ANALYZE orders;")

Step 4 — Verify:

Sort  (cost=12000..12003 rows=100 width=64)
      (actual time=312.4..312.5 rows=100 loops=1)
  ->  HashAggregate  (cost=9800..10300 rows=50000 width=64)
        (actual time=295.1..311.8 rows=48392 loops=1)
        ->  Hash Join  (cost=1200..7500 rows=92000 width=48)
              (actual time=45.2..212.3 rows=2314892 loops=1)
              ->  Index Scan on orders  (cost=0.56..4800 rows=92000)
                    (actual time=0.09..98.2 rows=92000 loops=1)
                    Index Cond: (status='completed'
                                 AND created_at >= NOW()-'90 days')

The query went from 4.8 seconds → 312 ms — a 15x speedup — by adding a single well-designed index. The key insight: the Seq Scan that was reading 7.1 million rows is now an Index Scan reading only 92,000.

Quick Reference: Interview Cheat Sheet

Scenario	Root Cause	Fix
Seq Scan on large table	Missing index	`CREATE INDEX` on filter columns
Index not used	Function wrapping column	Functional index or rewrite predicate
Estimated ≠ Actual rows	Stale statistics	`ANALYZE table_name`
Sort “Batches: N” (N>1)	work_mem too low	`SET work_mem = '256MB'`
Slow JOIN	No index on join key	Add index on FK column
N+1 queries	Application-level loop	Rewrite as single JOIN
CTE not optimized well	Optimization fence (pre-PG12)	Use `NOT MATERIALIZED` or inline
Hash Join on tiny tables	Stats misleading optimizer	Run `ANALYZE`, adjust `statistics` target

What We Covered

We traveled the full arc from the GPS analogy through the four stages of query processing, decoded the two optimizer philosophies, learned to read execution plan trees, catalogued every major operator type, explored how statistics fuel the optimizer’s decisions, and practiced a complete optimization workflow end-to-end.

The recurring theme: the optimizer is a probabilistic reasoner working from imperfect statistics and a finite search budget. Your job as an engineer is not to fight it — it’s to give it better information (statistics, indexes, schema design) so it can make better decisions automatically.

Interview Mindset: When asked about query performance, always follow the evidence: get the plan first, then diagnose, then fix. Never guess and add indexes blindly. The execution plan is the ground truth.