loke.dev
Header image for 3 Indexing Patterns for Recursive CTEs That Scale to Millions of Rows

3 Indexing Patterns for Recursive CTEs That Scale to Millions of Rows

Bypass the performance wall of hierarchical queries by mastering search-depth-first execution and materialized path strategies for deep tree structures.

· 7 min read

I remember watching a production dashboard crawl to a halt because someone decided to build a "simple" threaded comment system using a standard recursive CTE. It worked perfectly with 10,000 rows. At two million, the database spent more time thrashing its cache than actually returning rows.

Recursive Common Table Expressions (CTEs) are the go-to solution for hierarchical data in SQL, but they have a nasty habit of hitting a performance wall. When you're dealing with deep trees—think file systems, multi-level marketing structures, or complex bill-of-materials—the way you index your tables dictates whether your query takes 10 milliseconds or 10 seconds.

The problem isn't the recursion itself; it's how the database engine handles the "recursive member" (the part of the query that references the CTE name). For every level of depth, the engine performs a join. If those joins aren't indexed perfectly, you're essentially performing a sequential scan for every single level of your tree.

Here are three indexing patterns that actually scale when your hierarchy hits the millions.

1. The Covered Adjacency Index

Most people start with a basic adjacency list: id and parent_id. It’s simple, it’s normalized, and it’s usually the source of the first performance bottleneck.

When you run a recursive CTE, the engine starts with the "anchor" (the root nodes) and then repeatedly joins the table against the results of the previous iteration. The join condition is almost always t.parent_id = cte.id.

If you only have an index on id, the database has to search the entire parent_id column to find children. If you only have an index on parent_id, you might get lucky, but you're still jumping back to the heap to grab other columns.

The Pattern: Composite Covering Index

To make this scale, you need a composite index that covers the join and includes the primary identifier.

-- The standard setup
CREATE TABLE nodes (
    id SERIAL PRIMARY KEY,
    parent_id INT REFERENCES nodes(id),
    node_name TEXT,
    created_at TIMESTAMP
);

-- The Performance Killer: Just indexing the parent_id
CREATE INDEX idx_nodes_parent_id ON nodes(parent_id);

-- The Scalable Pattern: A covering index
CREATE INDEX idx_nodes_hierarchy_flow 
ON nodes (parent_id, id) 
INCLUDE (node_name);

Why this works:
By using (parent_id, id), you are providing the database with a pre-sorted map of every parent's children. The INCLUDE clause (available in PostgreSQL and SQL Server) adds the payload data directly to the index leaf nodes. This allows for an Index Only Scan. The database never has to look at the actual table rows during the recursion—it just stays within the index's B-Tree structure.

When I’ve implemented this on multi-million row tables, the EXPLAIN ANALYZE output usually shifts from "Index Scan" to "Index Only Scan," often cutting execution time by 60-70%.

2. Materialized Path with Specialized GIST Indexing

Sometimes, the "one-join-per-level" nature of recursive CTEs is the bottleneck itself. If your tree is 50 levels deep, you’re doing 50 joins. There is no way around that math with an adjacency list.

This is where the Materialized Path pattern comes in. Instead of just storing the parent, you store the entire lineage as a string or an array (e.g., 1.5.12.42).

In PostgreSQL, the ltree extension is the gold standard for this. It provides a specialized data type and index types that make recursive-style queries nearly instantaneous without using a WITH RECURSIVE clause at all.

The Pattern: ltree and GiST

First, you enable the extension and structure your table to store the path.

CREATE EXTENSION IF NOT EXISTS ltree;

CREATE TABLE category_tree (
    id SERIAL PRIMARY KEY,
    path ltree,
    name TEXT
);

-- Populate with some data
-- Path represents: root.electronics.audio
INSERT INTO category_tree (path, name) VALUES ('1', 'Electronics');
INSERT INTO category_tree (path, name) VALUES ('1.2', 'Audio');
INSERT INTO category_tree (path, name) VALUES ('1.2.3', 'Headphones');

-- The Secret Sauce: The GiST Index
CREATE INDEX idx_category_path_gist ON category_tree USING GIST (path);

Querying the whole subtree:
Instead of a complex CTE, you use the "subpath" operator <@.

-- Find all descendants of 'Audio' (path '1.2')
SELECT * FROM category_tree 
WHERE path <@ '1.2';

Why this scales:
The GiST (Generalized Search Tree) index is specifically designed to handle "is-contained-in" or "overlaps" logic. Unlike a B-Tree, which is great for "equals" or "greater than," GiST can quickly prune entire branches of the tree during the search.

The kicker here is that your query time becomes proportional to the size of the *result set*, not the size of the *entire table*. I’ve seen this pattern handle hierarchies with 10 million nodes where fetching a 1,000-node subtree takes under 5ms.

3. The Depth-First Search (DFS) Sort Key Pattern

One thing developers often forget is that recursive CTEs usually return rows in Breadth-First order by default. To get a proper visual tree (where children follow parents immediately), you often see people using ORDER BY with an array of IDs they've built up during recursion.

At scale, sorting that array is a nightmare. It consumes massive amounts of work_mem.

The solution is to calculate a sort key *during the insert/update* or via a highly optimized index that mimics the DFS order. But if you must do it in the CTE, you need to minimize the memory footprint of that sort key.

The Pattern: Binary Path Encoding

Instead of building an array of integers (which is bulky), use a binary or byte-string path.

WITH RECURSIVE tree_view AS (
    -- Anchor member
    SELECT 
        id, 
        parent_id, 
        node_name,
        CAST(set_byte('\x00', 0, id % 256) AS bytea) as sort_path
    FROM nodes
    WHERE parent_id IS NULL

    UNION ALL

    -- Recursive member
    SELECT 
        t.id, 
        t.parent_id, 
        t.node_name,
        p.sort_path || set_byte('\x00', 0, t.id % 256)
    FROM nodes t
    JOIN tree_view p ON t.parent_id = p.id
)
SELECT * FROM tree_view ORDER BY sort_path;

Wait, what's the catch?
The "gotcha" here is that ORDER BY on a recursive CTE usually forces the database to materialize the entire result set before sorting. If you're only trying to show the first 50 rows of a 1-million-row tree, the database is still doing all the work for the 999,950 rows you aren't seeing.

To fix this, you combine Pattern 1 (the covering index) with a depth limit or a paginated anchor.

If you are scaling to millions of rows, you should almost never be running a "naked" recursive CTE that tries to traverse the whole table. You should always be starting from a specific root ID.

-- Optimized for a single branch
WITH RECURSIVE branch AS (
    SELECT id, parent_id, node_name, 1 as depth
    FROM nodes
    WHERE id = 500 -- Start at a specific point
    
    UNION ALL
    
    SELECT t.id, t.parent_id, t.node_name, p.depth + 1
    FROM nodes t
    INNER JOIN branch p ON t.parent_id = p.id
    WHERE p.depth < 10 -- Safety valve for deep trees
)
SELECT * FROM branch;

The "Hidden" Bottleneck: work_mem

If you're running recursive CTEs against millions of rows in PostgreSQL, the default work_mem (usually 4MB) will kill you.

Each iteration of the recursion stores its temporary results in memory. If that result set exceeds work_mem, the database starts writing to disk (temporary files). Disk I/O is orders of magnitude slower than RAM.

I once cut a recursive query from 12 seconds down to 800ms just by increasing work_mem for that specific session:

SET LOCAL work_mem = '256MB';
-- Run your recursive query here
COMMIT; 

Don't do this globally, or your database will run out of memory when 100 users hit it at once. But for specific, heavy-duty hierarchical reports, it's a valid lever to pull.

Choosing the Right Pattern

If you're trying to decide which of these to use, here is my rule of thumb:

1. Use the Covered Adjacency Index if your hierarchy is shallow (under 10-15 levels) and your primary goal is maintaining a strictly normalized schema with easy inserts.
2. Use `ltree` (Materialized Path) if you have a read-heavy application where you constantly need to query subtrees at any depth. It is significantly faster than any CTE you can write, but it comes at the cost of managing the path column on every insert/move.
3. Use the Closure Table (a pattern I didn't detail here, but worth researching) if your hierarchy is "messy"—meaning nodes have multiple parents or you need to perform complex many-to-many tree operations.

Recursive CTEs are powerful, but they are not "set it and forget it" features. You have to think about how the pointer moves through the B-Tree at every step of the recursion. If you don't give the database a covered index to walk on, it's going to spend all its time looking for the next branch instead of climbing the tree.