loke.dev
Header image for What Nobody Tells You About Postgres TOAST: The Hidden Performance Tax of Large JSONB Columns

What Nobody Tells You About Postgres TOAST: The Hidden Performance Tax of Large JSONB Columns

Explore the invisible storage mechanism that Postgres uses to handle oversized columns and why it can silently triple your disk I/O while destroying query latency.

· 9 min read

Imagine you’ve just inserted a 50KB JSON blob into a jsonb column. To your application, it’s just one row. To Postgres, that row is now a fragmented puzzle spread across different files on the disk. When a row exceeds the default page size—usually 8KB—Postgres realizes it can’t fit the data into a standard buffer page. Instead of throwing an error, it invokes TOAST (The Oversized-Attribute Storage Technique). It slices your JSON into 2KB chunks, compresses them, and moves them to a secondary "shadow" table.

This mechanism is the only reason Postgres can handle columns up to 1GB, but it is far from free. While TOAST keeps your main heap tables lean, it introduces a massive hidden tax on CPU and I/O that most developers only discover when their p99 latency starts spiking for seemingly simple SELECT queries.

The 2KB Trigger: When Your Data Goes "Off-Heap"

Postgres organizes data in 8KB pages. Within those pages, it wants to fit as many rows as possible to maximize cache hits. If a row is too wide, it blocks other rows from fitting into the same page, forcing the engine to do more disk I/O to read the same amount of data.

To prevent this "row bloating," Postgres sets a threshold—typically 2KB. If a field’s size exceeds this, Postgres attempts to compress it. If it’s still over the limit after compression, it gets TOASTed.

Let's look at how this looks in practice. We'll create a table and see when the TOAST engine kicks in.

CREATE TABLE user_profiles (
    id SERIAL PRIMARY KEY,
    metadata JSONB
);

-- Insert a small JSON object (Under 2KB)
INSERT INTO user_profiles (metadata) 
VALUES ('{"name": "Alice", "role": "admin"}');

-- Insert a large JSON object (Approx 50KB)
INSERT INTO user_profiles (metadata) 
VALUES (jsonb_build_object('data', repeat('x', 50000)));

To see the storage impact, we can use the pg_column_size function. This function reveals the actual size of the data as it's stored on disk, versus its logical size.

SELECT 
    id, 
    pg_column_size(metadata) as storage_size,
    octet_length(metadata::text) as logical_size
FROM user_profiles;

For the small row, storage_size and logical_size will be nearly identical. For the large row, you’ll notice the storage_size is significantly smaller if it was highly compressible, or it will simply be a "pointer" size (usually around 18-24 bytes) if you check the main table's page layout.

The Storage Strategies: PLAIN vs. EXTENDED

Every column in Postgres has a "storage strategy." You can check yours by querying pg_attribute.

1. PLAIN: Used for fixed-length types (like integer). It never TOASTs.
2. EXTENDED: The default for jsonb. It allows both compression and out-of-line storage.
3. EXTERNAL: Allows out-of-line storage but forbids compression.
4. MAIN: Allows compression but tries to keep the data in the main table. It only TOASTs as a last resort.

SELECT attname, attstorage 
FROM pg_attribute 
WHERE attrelid = 'user_profiles'::regclass AND attname = 'metadata';

If you see x (Extended), your large JSONB blobs are being compressed using PGLZ (or LZ4 in newer versions) and then moved to a TOAST table.

Why "Select *" is Now Your Worst Enemy

In a normal table, if you SELECT id FROM users, Postgres reads the page and grabs the ID. If you SELECT metadata FROM users, and that metadata is TOASTed, Postgres has to:
1. Read the main table page to get the TOAST pointer.
2. Look up the TOAST table by its OID.
3. Fetch all the 2KB chunks associated with that pointer.
4. Reassemble the chunks in memory.
5. Decompress the data.

This is the De-TOASTing Tax. If you have a table with 10 columns and one of them is a 100KB JSONB field, a SELECT * query is effectively performing dozens of extra random I/O operations and burning CPU cycles on decompression for *every single row*.

I’ve seen production systems where a "small" dashboard query fetching 100 rows went from 5ms to 500ms simply because someone added a large history_logs JSONB column and used a generic ORM that defaults to SELECT *.

The Partial Update Myth

One of the biggest misconceptions about jsonb is that Postgres can "partially update" a field. If you have a 5MB JSON blob and you update one single key using the || operator or jsonb_set, Postgres does not surgically change those bytes on disk.

Because Postgres uses MVCC (Multi-Version Concurrency Control), it must create a new version of the row. For TOASTed data, if the data itself changes, Postgres must write a completely new set of TOAST chunks.

-- This looks like a small update, but it rewrites the entire 50KB TOAST entry
UPDATE user_profiles 
SET metadata = metadata || '{"last_updated": "now"}'::jsonb 
WHERE id = 2;

This creates massive write amplification. Your 1KB logic update results in 50KB+ of new data written to the TOAST table and eventually 50KB of dead tuples for the VACUUM process to clean up later. If you are updating large JSONB columns frequently, your VACUUM worker will struggle to keep up, leading to table bloat.

Finding the Shadow Tables

Postgres hides TOAST tables in a specific schema called pg_toast. You won't see them with a standard \dt in psql. To find the TOAST table associated with your user_profiles table, you need to query pg_class.

SELECT 
    relname as table_name, 
    reltoastrelid::regclass as toast_table_name
FROM pg_class 
WHERE relname = 'user_profiles';

Once you have the name (it usually looks like pg_toast.pg_toast_16384), you can check its size. Often, you'll find that the TOAST table is 10x or 100x larger than the main table.

SELECT pg_size_pretty(pg_total_relation_size('pg_toast.pg_toast_16384'));

The Indexing Trap: High-Cost GIN Indexes

Indexing jsonb is a common way to maintain performance, usually via GIN (Generalized Inverted Index). However, when your JSONB is large enough to be TOASTed, building and maintaining these indexes becomes a bottleneck.

A GIN index on a large JSONB blob extracts every key and value into the index. If your TOASTed JSON contains thousands of keys, every INSERT or UPDATE must de-TOAST the value, parse the keys, and update the index tree.

Pro-tip: If you only care about a few specific keys within a large JSON blob, never index the whole column. Use an expression index instead:

-- Good: Indexing only the relevant key
CREATE INDEX idx_user_role ON user_profiles ((metadata->>'role'));

-- Expensive: Indexing the entire massive blob
CREATE INDEX idx_user_metadata_full ON user_profiles USING GIN (metadata);

Compression: LZ4 vs. PGLZ

Before Postgres 14, you were stuck with pglz, which is decent but can be slow and CPU-intensive for large payloads. Postgres 14 introduced support for lz4, which is significantly faster for both compression and decompression.

If you are running on Postgres 14+ and you have large JSONB columns, you should check if you're using lz4. It’s not the default for existing tables. You can change it like this:

ALTER TABLE user_profiles ALTER COLUMN metadata SET STORAGE EXTERNAL; -- Temporary
ALTER TABLE user_profiles ALTER COLUMN metadata SET COMPRESSION lz4;
ALTER TABLE user_profiles ALTER COLUMN metadata SET STORAGE EXTENDED;

*Note: Changing the compression setting only affects new data or updated rows. It won't re-compress existing data on the spot.*

When TOAST Makes Sense (and When It Doesn't)

I’ve spent a lot of time debugging "slow Postgres" only to find that the culprit was a "Schema-less" design that got out of hand. TOAST is a brilliant engineering solution for an edge case, but it shouldn't be your primary storage strategy.

Use TOASTed JSONB when:
- The data is rarely accessed (e.g., audit logs, raw API snapshots).
- You are mostly doing "Point Lookups" by ID and fetching the whole blob.
- The schema is truly volatile and cannot be normalized.

Avoid TOASTed JSONB when:
- You frequently aggregate over the data (e.g., SUM, AVG inside the JSON).
- You are constantly updating small portions of the JSON.
- You have high-frequency SELECT * queries.

Strategies to Mitigate the TOAST Tax

If you find your application is choking on large JSONB storage, you have a few architectural paths.

1. Vertical Normalization

If 90% of your queries only need 3 fields from the JSON, pull those fields out into real columns. Postgres handles TEXT and INTEGER columns much more efficiently. You can keep the "junk drawer" JSON column for the remaining 10% of the data.

2. The "Pre-rendered" Summary

If your JSON is a massive nested object used for a UI component, consider storing a smaller "summary" JSONB column that contains only the data needed for the initial page load. Store the "Full" version in a separate table or a cold-storage bucket like S3.

3. Forced MAIN Storage

If your JSON blobs are consistently around 3KB-5KB, you can try to force Postgres to keep them in the main table to avoid the pointer overhead.

ALTER TABLE user_profiles ALTER COLUMN metadata SET STORAGE MAIN;

This tells Postgres: "Try really hard to compress this and keep it in the main page. Only move it to TOAST if the page is literally full." This can speed up reads because it avoids the secondary table lookup, but it will increase "row bloat" and might slow down sequential scans of the main table.

4. Out-of-Database Compression

If your data is truly massive (e.g., 500KB+ JSON objects), sometimes it's better to compress the data in your application (using Zstd or Gzip) and store it as a BYTEA column.

Wait, what's the catch? If you do this, you lose the ability to use Postgres JSON functions. You can't query WHERE metadata->>'role' = 'admin' if the data is a compressed binary blob. This is a trade-off: maximum performance vs. database-level queryability.

Monitoring the Tax in Real-Time

To truly understand if TOAST is killing you, you need to look at wait events. If you see high IO: DataFileRead on files that correspond to your TOAST tables, you're I/O bound by de-toasting.

You can also use explain (analyze, buffers) to see how many "shared hit" or "read" blocks are coming from the TOAST table.

EXPLAIN (ANALYZE, BUFFERS)
SELECT metadata FROM user_profiles WHERE id = 12345;

In the output, look for the Buffers line. If you see a high number of blocks being read for a single row lookup, that’s the TOAST engine working overtime to reassemble your JSON.

Summary: Respect the Threshold

The TOAST engine is a silent partner. Most of the time, it handles the messy reality of large data without you having to think about it. But when you treat a relational database like a document store, you eventually hit the physical limits of the 8KB page.

Large JSONB columns are convenient until they aren't. By understanding the 2KB trigger, the cost of de-toasting, and the reality of write amplification, you can build Postgres schemas that scale without the invisible performance tax. Keep your "hot" data in real columns, keep your JSON blobs lean, and if you must go large, do it with your eyes open to the I/O costs.