loke.dev
Header image for 4 Architectural Patterns for Scaling Postgres Materialized Views Beyond the CRON Job

4 Architectural Patterns for Scaling Postgres Materialized Views Beyond the CRON Job

Master high-frequency data aggregation by moving beyond the full table lock of basic materialized view refreshes.

· 4 min read

I once watched a production dashboard freeze for forty seconds because a "simple" materialized view refresh decided to hog every available I/O cycle on our primary. It felt like watching a slow-motion car crash where the car is my database and the wall is the user’s patience. We’ve all been there—relying on a CRON job to keep things snappy, only to realize that as data grows, the default REFRESH MATERIALIZED VIEW becomes a liability rather than a feature.

Postgres materialized views are fantastic until they aren't. They give you the speed of a pre-computed table with the convenience of a view, but the management of those views is where most of us stumble. If you’re still running a basic REFRESH every hour and hoping for the best, it’s time to look at some more robust patterns.

1. The CONCURRENTLY Dance (and its hidden costs)

Most people discover REFRESH MATERIALIZED VIEW CONCURRENTLY and think their problems are solved. It allows you to refresh the view without taking an exclusive lock, meaning your users can keep reading while the update happens.

But there’s a catch: you need a unique index on the materialized view for this to work.

-- First, you need a unique index
CREATE UNIQUE INDEX idx_sales_summary_id ON sales_summary (region_id, report_date);

-- Then, you can refresh without blocking readers
REFRESH MATERIALIZED VIEW CONCURRENTLY sales_summary;

The Gotcha: While it doesn't block readers, it’s significantly slower than a standard refresh and creates a massive amount of WAL (Write Ahead Log) traffic. I’ve seen concurrent refreshes take 3x longer than standard ones because Postgres has to do a lot of internal diffing to update only the changed rows. If your table is huge and you're refreshing frequently, your disk I/O might scream for mercy.

2. The Atomic Shadow Table Swap

If your materialized view logic is complex or takes a long time to calculate, even a concurrent refresh might lag too much. This is where the "Shadow Table" pattern shines. Instead of refreshing the view, you build a brand new table behind the scenes and swap it in using a transaction.

This is the "old school" way, but it’s incredibly reliable because the swap is nearly instantaneous.

BEGIN;

-- 1. Create a new table with the same structure
CREATE TABLE sales_summary_new AS 
SELECT region_id, SUM(amount) 
FROM sales 
GROUP BY region_id;

-- 2. Re-create any necessary indexes on the new table
CREATE INDEX ON sales_summary_new (region_id);

-- 3. Drop the old view/table and rename the new one
DROP MATERIALIZED VIEW IF EXISTS sales_summary; 
-- Or DROP TABLE sales_summary;

ALTER TABLE sales_summary_new RENAME TO sales_summary;

COMMIT;

Why do this? Because it gives you total control. You can run data validation checks on sales_summary_new *before* you commit the swap. If the data looks like garbage (e.g., a count returns zero when it shouldn't), you can roll back the transaction and keep the old data live.

3. The "State Machine" Trigger (Incremental Updates)

Let’s be honest: recalculating 10 million rows because 5 rows changed is a waste of CPU cycles. If you need near real-time updates without the overhead of a full refresh, you can mimic a materialized view using a standard table and triggers.

I call this the "Poor Man's Incremental Refresh."

CREATE TABLE sales_stats (
    region_id INT PRIMARY KEY,
    total_sales NUMERIC DEFAULT 0
);

CREATE OR REPLACE FUNCTION update_sales_stats()
RETURNS TRIGGER AS $$
BEGIN
    IF (TG_OP = 'INSERT') THEN
        INSERT INTO sales_stats (region_id, total_sales)
        VALUES (NEW.region_id, NEW.amount)
        ON CONFLICT (region_id) 
        DO UPDATE SET total_sales = sales_stats.total_sales + EXCLUDED.total_sales;
    ELSIF (TG_OP = 'DELETE') THEN
        UPDATE sales_stats 
        SET total_sales = total_sales - OLD.amount
        WHERE region_id = OLD.region_id;
    END IF;
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_refresh_sales_stats
AFTER INSERT OR DELETE ON sales
FOR EACH ROW EXECUTE FUNCTION update_sales_stats();

Warning: Triggers add latency to every INSERT or UPDATE on your source table. If you're doing high-volume bulk loads, this will slow you down. But for low-to-medium write volume where you need sub-second accuracy on aggregates, this pattern is a lifesaver.

4. Continuous Aggregates (The TimescaleDB Shortcut)

If you find yourself constantly fighting with materialized view refreshes, you might be using the wrong tool for the job. Extensions like TimescaleDB introduced a concept called "Continuous Aggregates."

They look and act like materialized views but are managed by a background worker that only refreshes the "buckets" of data that have actually changed. You don't have to write the refresh logic; you just define a policy.

-- Creating a continuous aggregate in TimescaleDB
CREATE MATERIALIZED VIEW weather_summary
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', observation_time) AS bucket,
       avg(temperature) AS avg_temp
FROM weather_data
GROUP BY bucket;

-- Refresh automatically every 30 minutes, 
-- but only for the last 24 hours of data
SELECT add_continuous_aggregate_policy('weather_summary',
    start_offset => INTERVAL '24 hours',
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '30 minutes');

This is the "gold standard" if you're dealing with time-series data. It avoids the "all or nothing" problem of native Postgres materialized views by tracking which time intervals are "dirty" and only recalculating those.

Which one should you choose?

- Use `CONCURRENTLY` if your data is small-to-medium and you have a clear unique key. It's the "low effort" win.
- Use the Table Swap if your refresh logic is expensive and you want a "safety valve" to validate data before it goes live.
- Use Triggers if you need the data to be 100% accurate up to the last millisecond and can afford a slight hit to write performance.
- Use Continuous Aggregates (TimescaleDB) if you're dealing with massive amounts of time-stamped data and you're tired of reinventing the wheel.

Scaling Postgres is often about knowing when to stop asking the database to do things "the easy way" and start telling it exactly how you want your data managed. Move beyond the CRON job—your users (and your I/O wait times) will thank you.