loke.dev
Header image for The 'OR' Clause Tax: Why Your Postgres Filters Are Silently Forcing Sequential Scans

The 'OR' Clause Tax: Why Your Postgres Filters Are Silently Forcing Sequential Scans

Learn why the Postgres query planner often penalizes OR logic by abandoning indexes and how a simple shift to query splitting can restore your sub-millisecond throughput.

· 8 min read

I once watched a dashboard time out for twenty seconds because someone added a single OR filter to a report. It was a simple request—"find my orders or orders assigned to my team"—but it brought a high-traffic production database to its knees. The indexes were there, the statistics were fresh, but Postgres decided that reading the entire table from disk was faster than using the tools we gave it.

This is the "OR Clause Tax." It’s a performance penalty that feels like a betrayal. You’ve done the right thing by indexing your columns, but the moment you introduce OR logic across different attributes, the Postgres query planner often throws its hands up and reverts to a Sequential Scan.

To understand why this happens—and how to stop it—we need to look at how the planner thinks.

The Illusion of Efficiency

Let’s set the stage with a standard schema. Imagine an audit_logs table. We have millions of rows, and we frequently query based on the user_id who performed an action or the target_id that was affected.

CREATE TABLE audit_logs (
    id SERIAL PRIMARY KEY,
    user_id INT NOT NULL,
    target_id INT NOT NULL,
    action_type TEXT NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_audit_logs_user_id ON audit_logs(user_id);
CREATE INDEX idx_audit_logs_target_id ON audit_logs(target_id);

-- Generate 2 million rows of dummy data
INSERT INTO audit_logs (user_id, target_id, action_type)
SELECT 
    (random() * 10000)::int, 
    (random() * 10000)::int, 
    'update'
FROM generate_series(1, 2000000);

Individually, these indexes are fast. If you query for a specific user_id, Postgres will perform an Index Scan, and you'll get your results in a few milliseconds.

-- Fast!
SELECT * FROM audit_logs WHERE user_id = 42;

But the moment you want to see any activity *related* to a specific ID—whether they were the actor or the target—the tax man comes to collect.

-- The problematic query
EXPLAIN ANALYZE
SELECT * FROM audit_logs 
WHERE user_id = 42 OR target_id = 42;

On a large enough table, you’ll likely see something like this in your plan:
Seq Scan on audit_logs (cost=0.00..43405.00 rows=400 width=32) (actual time=15.234..180.450 rows=380 loops=1)

Wait. Why a Seq Scan? We have indexes on both columns. Why didn't Postgres just look up 42 in the user_id index, look up 42 in the target_id index, and combine the results?

Why the Planner Falters

Postgres is a cost-based optimizer. It estimates the "cost" of different execution paths and picks the cheapest one.

When you use AND, Postgres can use an index to find a small subset of rows and then filter them. But with OR, Postgres has to find rows that satisfy *either* condition. This creates a logical "expansion."

The engine has three main choices:
1. The Bitmap Or Scan: It scans the user_id index and builds a "map" of which pages in the table contain matching rows. Then it does the same for the target_id index. It performs a bitwise OR on these maps and finally fetches the data from the heap.
2. The Sequential Scan: It says, "Checking these two indexes and merging them is a lot of work. I'll just read the whole table and check every row against the criteria."
3. The Index Scan (Rare for OR): It picks one index and filters the rest, but this only works if the first condition is extremely restrictive.

The problem is the Bitmap Heap Scan. While more efficient than a sequential scan, it’s significantly slower than a direct Index Scan because it involves multiple passes and substantial CPU overhead to merge the bitmaps. As your table grows or your data distribution becomes less predictable, the planner's cost estimate for the Bitmap approach often exceeds the cost of just reading the whole table (Sequential Scan), especially if the data is likely to be in the filesystem cache.

The "UNION ALL" Pattern: Taking Back Control

The most reliable way to avoid the OR clause tax is to stop using OR across different columns. Instead, you can treat the query as two separate requests and join the results.

In SQL, this is done via UNION or UNION ALL.

-- The optimized version
SELECT * FROM audit_logs WHERE user_id = 42
UNION ALL
SELECT * FROM audit_logs WHERE target_id = 42 AND user_id <> 42;

*(Note: We use UNION ALL with a guard clause user_id <> 42 to avoid duplicates while maintaining speed. A plain UNION would remove duplicates automatically, but it adds a costly de-duplication step that often negates the performance gains.)*

Let’s look at the EXPLAIN for this:

Append (cost=0.43..16.90 rows=400 width=32)
  -> Index Scan using idx_audit_logs_user_id on audit_logs (cost=0.43..8.45 rows=200 width=32)
        Index Cond: (user_id = 42)
  -> Index Scan using idx_audit_logs_target_id on audit_logs (cost=0.43..8.45 rows=200 width=32)
        Index Cond: (target_id = 42)
        Filter: (user_id <> 42)

By splitting the query, we’ve forced Postgres to use the indexes. Each part of the UNION ALL is an independent, highly optimized index lookup. The Append operation is nearly instantaneous. You’ve just turned a 180ms sequential scan into a 0.5ms index operation.

When IN is Better Than OR

It’s important to distinguish between OR across *different* columns and OR on the *same* column.

If you are doing this:
WHERE user_id = 1 OR user_id = 2 OR user_id = 3

Postgres is actually quite good at optimizing this. It will usually convert this into an IN list or a single index scan that iterates over the constants. The "Tax" specifically applies when you are bridging different indexes or complex expressions where the engine cannot easily find a single path to the data.

The GIN Index Alternative

If your application relies heavily on querying multiple columns with OR logic (common in search features), you might want to look beyond B-tree indexes.

A Generalized Inverted Index (GIN) is often associated with full-text search or JSONB, but it can be used for multi-column filtering via the btree_gin extension.

CREATE EXTENSION btree_gin;
CREATE INDEX idx_audit_logs_combined_gin ON audit_logs USING GIN (user_id, target_id);

A GIN index can handle multiple conditions in a single scan. However, GIN indexes are significantly slower to update (write-heavy) and larger than B-trees. They are a specialized tool—don't reach for them unless the UNION ALL pattern becomes too messy to maintain in your application code.

Hidden Traps: Functions and Coalesce

The OR clause tax often hides inside seemingly innocuous functions. One of the most common performance killers is the "Optional Filter" pattern in stored procedures or dynamic SQL:

-- Don't do this
SELECT * FROM audit_logs
WHERE (p_user_id IS NULL OR user_id = p_user_id)
  AND (p_target_id IS NULL OR target_id = p_target_id);

This is a nightmare for the query planner. Because the planner doesn't know if p_user_id will be null or not when it creates the execution plan, it cannot reliably commit to using an index. It will almost always fall back to a sequential scan to be safe.

If you find yourself writing this, you are better off building the SQL string dynamically in your application logic to only include the filters that are actually present.

The Case for Partial Indexes

Sometimes the OR tax is unavoidable because you are filtering on a "status" or a "flag" alongside a foreign key.

Consider this: SELECT * FROM tasks WHERE (assigned_to_id = 123 OR status = 'URGENT').

If "URGENT" tasks represent only 1% of your data, a Partial Index can bypass the need for a complex OR strategy:

CREATE INDEX idx_urgent_tasks ON tasks(id) WHERE status = 'URGENT';

Now, your query can utilize a UNION ALL where one side uses the standard assigned_to_id index and the other side uses the tiny, highly efficient partial index.

The "Tax" is a Cost Estimate Error

Why doesn't Postgres just automatically convert OR to UNION?

Because it’s not always faster. If your OR condition covers 80% of the table, a sequential scan *is* actually the most efficient way to get the data. The overhead of managing index lookups and merging results is only worth it when the result set is small (high selectivity).

The problem is that Postgres’s cost model assumes that random I/O (index lookups) is much more expensive than sequential I/O. In the era of NVMe SSDs, the gap between sequential and random access has narrowed significantly.

You can try to tune this by adjusting the random_page_cost setting in your postgresql.conf:

-- Default is usually 4.0. Setting it closer to 1.1 or 1.0 (for SSDs) 
-- encourages the planner to use indexes more aggressively.
SET random_page_cost = 1.1;

This can sometimes "fix" the OR issue by making the Bitmap Or Scan look cheaper to the planner. But be careful: changing this globally affects every query in your system.

Practical Steps for Refactoring

If you suspect you're paying the OR tax, here is the workflow to fix it:

1. Run `EXPLAIN (ANALYZE, BUFFERS)`: Look for Seq Scan. If you see one on a query that should be fast, look at the filter.
2. Is the `OR` on the same column? If yes, use IN (...).
3. Is the `OR` on different columns? Try refactoring to UNION ALL.
4. Check for Null-checks: Avoid (col = val OR col IS NULL). Use COALESCE or dynamic SQL instead.
5. Benchmark the difference: Don't just assume. Sometimes the Bitmap scan *is* the right choice. Use BUFFERS to see how many data blocks are actually being read from disk versus memory.

Summary

The OR clause is a logical tool, not a performance tool. While it’s fine for small datasets or filtering on a single column, it becomes a liability in high-scale Postgres environments. By understanding that the planner prefers the predictability of AND and UNION, you can structure your queries to work with the engine rather than against it.

Don't let your queries suffer in silence with sequential scans. If you see an OR bridging two different indexes, it’s time to split the bill with a UNION ALL. Your throughput—and your on-call engineers—will thank you.