loke.dev
Header image for Skip the Leading Column

Skip the Leading Column

How Postgres 15+ finally solved the 'leading-column' requirement for composite index performance using Index Skip Scans.

· 4 min read

Most developers treat the order of columns in a composite index like a sacred blood oath, but Postgres 15 essentially turned that rule into a suggestion. For decades, the rule was simple: if you have a multi-column index on (a, b), your query *must* include column a in the WHERE clause, or the database will ignore your index and go for a slow, painful sequential scan.

That era is over. Thanks to Index Skip Scans, Postgres can now jump through your indexes even when you ignore the leading column. It’s not magic, and it’s not always the right choice, but it changes how we think about schema design.

The Old Guard: Why we were stuck

To understand why this is a big deal, we have to look at how B-Tree indexes actually sit on the disk. Think of a composite index on (tenant_id, created_at) like a physical phonebook. It's sorted by Last Name (tenant_id), then First Name (created_at).

If I ask you to find every "John" in the book, but I don't give you a last name, you're stuck reading every single page. That's a sequential scan. Historically, Postgres was just as stubborn.

-- The classic setup
CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    org_id INT NOT NULL,
    status TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT now()
);

-- We create a composite index
CREATE INDEX idx_org_status ON events (org_id, status);

-- Old Postgres (pre-15) would ignore the index here:
SELECT * FROM events WHERE status = 'failed';

In Postgres 14 and below, that query triggers a full table scan. You'd be forced to create a second index just for status, doubling your storage overhead and slowing down every INSERT.

How the Skip Scan changes the game

Index Skip Scans allow the query planner to say: "Okay, I don't know the org_id, but I know it's the first part of my index. I'll just grab the first org_id, search for the status within that group, and then jump (skip) to the next unique org_id and repeat."

Instead of scanning the whole table, it performs a series of smaller, targeted index lookups.

Seeing it in action

Let's build a real-world scenario. Imagine a multi-tenant app where we have thousands of organizations, but each organization only has a few statuses.

-- Let's populate some dummy data
INSERT INTO events (org_id, status)
SELECT 
    gs % 100, -- 100 unique organizations (low cardinality)
    CASE WHEN gs % 10 = 0 THEN 'failed' ELSE 'success' END
FROM generate_series(1, 1000000) gs;

-- Make sure stats are fresh
VACUUM ANALYZE events;

Now, we run a query filtering only by the second column:

EXPLAIN ANALYZE
SELECT * FROM events WHERE status = 'failed';

If you’re on Postgres 15 or 16, you’ll likely see something like this in the output:

Index Only Scan using idx_org_status on events (cost=0.42..1542.40 rows=100000 width=12)

Wait, it used the index? Yes. Even though org_id isn't in the query, Postgres "skipped" across the 100 distinct organization IDs to find the 'failed' records.

The Cardinality Catch

Before you go deleting all your single-column indexes, there is a massive catch: cardinality.

The Skip Scan is brilliant when the leading column has a small number of distinct values (low cardinality). If you have 50 org_ids and 1,000,000 rows, jumping 50 times is extremely fast.

However, if your leading column is a UUID or a timestamp where every single row is unique, the Skip Scan becomes a disaster. Jumping a million times is actually slower than just reading the table from start to finish.

The Golden Rule for Skip Scans:
* Leading column has few unique values? Skip scan is a hero.
* Leading column is unique/high cardinality? You still need a dedicated index on that second column.

Why this matters for your Schema

I used to obsess over index ordering. "Do I put the most selective column first? Or the one I use in every query?"

Postgres 15+ gives us a bit of breathing room. It allows us to lean more heavily on broad, composite indexes. You can now design your indexes to support your primary "power queries" (like tenant_id + created_at) while knowing that those same indexes can now assist "admin queries" (like searching all recent events across all tenants) without needing extra disk space for a second index.

One last thing: The "fake" filter

If you're on an older version of Postgres and can't upgrade yet, you can actually "fake" a skip scan manually using a Recursive CTE, but it's a headache to write. Postgres 15 just took that clever manual trick and baked it directly into the engine.

If you haven't checked your query plans since upgrading, try running an EXPLAIN on those queries you thought were "un-indexable." You might find that Postgres is being much smarter than you gave it credit for.