loke.dev
Header image for Defaulting to a Crawl

Defaulting to a Crawl

The difference between 'RANGE' and 'ROWS' in your Postgres window functions is often the difference between a sub-millisecond query and a production-stalling scan.

· 4 min read

Postgres is a masterpiece of engineering, but its default window function behavior is a trap designed to make your production server cry. Most developers reach for window functions to calculate running totals or moving averages, write a quick OVER (ORDER BY created_at), and call it a day.

What they don't realize is that by omitting a few specific keywords, they’ve opted into a "logical" window frame that forces Postgres to do significantly more work than necessary. In a small dev environment, it’s invisible. In a table with millions of rows? It’s the difference between a sub-second response and a query that hangs until the heat death of the universe.

The Invisible Default

When you write a window function in Postgres, you usually do something like this:

SELECT 
    id, 
    amount, 
    SUM(amount) OVER (ORDER BY created_at) as running_total
FROM transactions;

It looks innocent. But per the SQL standard, if you provide an ORDER BY but omit a frame clause, Postgres defaults to:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

The keyword there is `RANGE`, and it is a performance killer.

RANGE vs. ROWS: The Technical Brawl

The difference between RANGE and ROWS is how Postgres defines "peers."

* `ROWS` is literal. It cares about the physical position of rows. "Current row" means the row the engine is currently looking at.
* `RANGE` is logical. It cares about the values in your ORDER BY clause. "Current row" actually means "all rows that have the same value as the current row" (peers).

If you have a column with a lot of duplicate values—like a created_at timestamp that only tracks dates, or a status ID—RANGE has to do extra work to look ahead and see if the next rows are peers. Even if your values are unique (like a high-precision TIMESTAMPTZ), Postgres still spends CPU cycles checking for peers because it doesn't *know* they are unique unless you've explicitly told it so through a very specific frame.

Watching it Fail in Real Time

Let's look at a practical example. Suppose we have a ledger table with a million rows.

-- Setup a dummy table
CREATE TABLE ledger AS 
SELECT 
    id, 
    val, 
    (now() - (random() * interval '1 year'))::date as entry_date
FROM generate_series(1, 1000000) id, 
     (SELECT (random() * 100)::int as val) v;

CREATE INDEX idx_ledger_date ON ledger(entry_date);

If we run a running total using the default RANGE behavior:

-- The "Default" (Slow) Way
EXPLAIN ANALYZE
SELECT 
    entry_date, 
    SUM(val) OVER (ORDER BY entry_date) 
FROM ledger;

On a large dataset, Postgres has to keep track of the "peer groups" for every single date. Because we used date (which has many duplicates per day), the overhead is massive.

Now, look at the `ROWS` version:

-- The Optimized (Fast) Way
EXPLAIN ANALYZE
SELECT 
    entry_date, 
    SUM(val) OVER (ORDER BY entry_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
FROM ledger;

By adding ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, you are telling Postgres: "Just add the current row to the previous sum. Don't look ahead. Don't check for peers. Just keep moving."

In my testing, the ROWS version can easily be 5x to 10x faster on large datasets. The execution plan stays much leaner because the engine can process the window in a single pass without buffering peer groups.

When should you actually use RANGE?

I'll be honest: almost never.

You use RANGE when it is mathematically vital that duplicate values are treated as a single unit. For example, if you are calculating interest on an account balance and you want all transactions that happened on the exact same millisecond to be summed together before the interest is applied.

In 99% of web applications, you just want the running total of the rows as they appear. ROWS gives you that.

The "Moving Average" Gotcha

RANGE becomes even more dangerous when you do sliding windows. If you want a 7-day moving average, you might be tempted to use RANGE.

-- A 7-day window using RANGE
SELECT 
    entry_date,
    AVG(val) OVER (
        ORDER BY entry_date 
        RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW
    )
FROM ledger;

This is actually one of the few places RANGE is "smart"—it understands intervals. However, if your data is dense, this is still slower than a ROWS frame. If you have guaranteed data for every day, ROWS BETWEEN 6 PRECEDING AND CURRENT ROW will smoke the RANGE version in a speed test.

A Better Habit

I’ve started treating OVER (ORDER BY ...) as a code smell. It’s an incomplete thought. It’s a default that assumes you have infinite CPU cycles to spend on peer-group validation.

If you want your queries to stay snappy as your data grows, make your window frames explicit. Whenever you write a window function, your fingers should instinctively type:

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

It’s wordy, it’s annoying to remember, and it’s a bit of a mouthful. But it's the difference between a query that finishes before you blink and one that triggers a 2:00 AM PagerDuty alert because the database CPU is pinned at 100%.

Stop letting Postgres crawl by default. Tell it exactly how to walk.