Anatomy of a Column-Aligned Schema: Saving Space in the Postgres Storage Engine

I spent a long afternoon a few years ago staring at two Postgres tables that should have been identical. They had the same row count, the same indexes, and held the exact same data. Yet, one was consistently 15% larger than the other. I checked for dead tuples. I ran VACUUM FULL. I checked for index bloat. Nothing changed. It felt like the database was gaslighting me.

The culprit turned out to be something I’d never considered: the physical order of the columns.

In Postgres, the way you sequence your CREATE TABLE statement isn't just a matter of readability or organizational preference. It is a physical blueprint for how data is laid out on the disk. Because of how CPUs access memory, Postgres inserts invisible "padding" bytes between your columns to ensure data aligns with specific memory boundaries. If you’re not careful, you end up paying a "padding tax" on every single row in your database.

The CPU’s Obsession with Alignment

To understand why Postgres wastes space, we have to look at how modern CPUs read data. A CPU doesn't grab a single byte from RAM at a time; it typically grabs chunks (usually 4 or 8 bytes).

If you have an 8-byte integer (a bigint) that starts at an odd memory address—say, byte 3—the CPU might have to execute two separate memory reads to fetch that single number. To prevent this performance hit, compilers and databases use alignment. They ensure that an 8-byte value starts at a memory address that is a multiple of 8.

Postgres follows these same rules. Most types in Postgres require an alignment of 1, 2, 4, or 8 bytes.

- 1 byte: boolean, char, int1 (if using extensions)
- 2 bytes: smallint
- 4 bytes: integer, real, date
- 8 bytes: bigint, timestamp, double precision, numeric (can vary, but often aligns to 8)

When you define a column that requires 8-byte alignment immediately after a column that only takes up 1 byte, Postgres doesn't want to start that big value at an unaligned address. Instead, it leaves 7 bytes of empty, useless space—padding—to push the next value to the next 8-byte boundary.

Seeing the Bloat in Action

Let's look at a concrete example. We’ll create two tables. They have the exact same columns, but we’ll shuffle the order.

-- The "Naive" Layout: Mixed sizes
CREATE TABLE bad_layout (
    id bigint,            -- 8 bytes
    is_active boolean,    -- 1 byte
    created_at timestamp, -- 8 bytes (Needs 8-byte alignment)
    status_code int2,     -- 2 bytes
    updated_at timestamp, -- 8 bytes (Needs 8-byte alignment)
    is_deleted boolean    -- 1 byte
);

-- The "Optimized" Layout: Grouped by size (descending)
CREATE TABLE good_layout (
    created_at timestamp, -- 8 bytes
    updated_at timestamp, -- 8 bytes
    id bigint,            -- 8 bytes
    status_code int2,     -- 2 bytes
    is_active boolean,    -- 1 byte
    is_deleted boolean    -- 1 byte
);

If we insert 1 million rows into both, the difference is striking.

INSERT INTO bad_layout (id, is_active, created_at, status_code, updated_at, is_deleted)
SELECT 
    g, 
    true, 
    now(), 
    1, 
    now(), 
    false 
FROM generate_series(1, 1000000) g;

INSERT INTO good_layout (created_at, updated_at, id, status_code, is_active, is_deleted)
SELECT 
    now(), 
    now(), 
    g, 
    1, 
    true, 
    false 
FROM generate_series(1, 1000000) g;

Now, let's check the table sizes:

SELECT 
    relname AS table_name, 
    pg_size_pretty(pg_relation_size(relid)) AS data_size
FROM pg_stat_user_tables 
WHERE relname IN ('bad_layout', 'good_layout');

In most environments, bad_layout will clock in at roughly 64 MB, while good_layout will be around 48 MB. That’s a 25% reduction in size just by moving columns around.

Why? In bad_layout, the boolean (1 byte) followed by a timestamp (8 bytes) forces 7 bytes of padding. This happens twice. Then the int2 (2 bytes) followed by another timestamp forces 6 bytes of padding. Per row, we are wasting 20+ bytes. Multiply that by a billion rows, and you’re talking about gigabytes of wasted SSD space and, more importantly, wasted RAM in your buffer cache.

The Anatomy of a Tuple

Every row (or "tuple") in Postgres has a header. This header is typically 23 bytes long.

Because of the 8-byte alignment rule, the actual data of the row must start at a multiple of 8. So, that 23-byte header is followed by 1 byte of padding to make it 24 bytes before the first column even starts.

If your first column is a bigint (8 bytes), it fits perfectly at byte 24.
But if your first column is a boolean (1 byte), it occupies byte 24. If the *next* column is a bigint, it can't start at byte 25. It has to wait until byte 32.

Here is the mental model for bad_layout (simplified):
1. Header: 23 bytes
2. Padding: 1 byte (to reach 24)
3. ID (bigint): 8 bytes (24 to 32)
4. is_active (bool): 1 byte (32 to 33)
5. PADDING: 7 bytes (to reach 40)
6. created_at (timestamp): 8 bytes (40 to 48)
7. status_code (int2): 2 bytes (48 to 50)
8. PADDING: 6 bytes (to reach 56)
9. updated_at (timestamp): 8 bytes (56 to 64)
10. is_deleted (bool): 1 byte (64 to 65)
11. Final Padding: 7 bytes (to make the whole tuple a multiple of 8)
Total: 72 bytes per row.

Now look at good_layout:
1. Header: 23 bytes
2. Padding: 1 byte (to 24)
3. created_at: 8 bytes (24 to 32)
4. updated_at: 8 bytes (32 to 40)
5. id: 8 bytes (40 to 48)
6. status_code: 2 bytes (48 to 50)
7. is_active: 1 byte (50 to 51)
8. is_deleted: 1 byte (51 to 52)
9. Final Padding: 4 bytes (to reach 56)
Total: 56 bytes per row.

The savings accumulate because we "packed" the smaller types together at the end, filling the gaps that would otherwise be empty.

The Strategy: Sorting for Success

The general rule of thumb for optimizing Postgres storage is simple: Sort your columns by alignment requirements in descending order.

1. Fixed-length 8-byte types: bigint, timestamp, double precision.
2. Fixed-length 4-byte types: integer, date, float4.
3. Fixed-length 2-byte types: smallint.
4. Fixed-length 1-byte types: boolean, char(1).
5. Variable-length types: text, varchar, jsonb, bytea.

Wait, why put the variable-length types at the end?

Variable-length types (often called "varlenas" in the source code) have a different overhead. They typically have a 1 or 4-byte header. Postgres handles them somewhat differently, and they don't have the same strict 8-byte alignment requirements as a bigint. By pushing them to the end, you ensure they don't cause padding gaps between your fixed-width columns.

What About NULLs?

A common misconception is that NULL values take up the full space of the column type. They don't. Postgres uses a "null bitmap" in the tuple header to track which columns are null. If a column is null, it typically consumes zero bytes of space within the data section of the tuple.

However, the presence of the null bitmap itself can change the size of the header. If you have 8 or fewer columns, the header is 23 bytes. If you have more, the bitmap grows. Interestingly, the padding rules still apply to the header + bitmap combination. If your null bitmap pushes your header size to 27 bytes, you still need 5 bytes of padding to reach that 32-byte boundary for your first 8-byte column.

Inspecting Your Own Tables

You don't have to guess if your tables are bloated. Postgres provides an extension called pageinspect that lets you look at the raw bytes, but a simpler way is using the pg_column_size function.

Try running this on one of your existing rows:

SELECT 
    pg_column_size(t.*) as total_row_size,
    (SELECT sum(pg_column_size(a)) FROM unnest(array[t.*]) a) as summed_column_size
FROM your_table_name t
LIMIT 1;

If total_row_size is significantly larger than summed_column_size plus the 24-byte header, you have found padding bloat.

The Practical Reality: Is It Worth It?

I’ll be honest: if your table has 10,000 rows, this doesn't matter. Don't waste your time refactoring a small configuration table because you’re worried about 40KB of padding.

However, this becomes a critical design decision in two scenarios:
1. High-volume tables: If you are logging events, tracking financial transactions, or storing sensor data at a rate of millions of rows per day, 20% bloat is massive. It affects disk IO, backup speed, and how much of your index can fit in memory.
2. Wide tables: If you have a table with 100 columns and you’ve peppered booleans and smallints randomly between timestamps, your bloat could be significantly higher than 20%.

One major "gotcha" to keep in mind: Postgres does not let you reorder columns easily.

If you run ALTER TABLE ... ADD COLUMN, Postgres always appends that column to the end of the physical layout. You cannot "insert" a column into the middle of the schema to fix the alignment. To reorder a table, you effectively have to:
1. Create a new table with the correct order.
2. Copy the data over (INSERT INTO ... SELECT).
3. Drop the old table and rename the new one.

This is a heavy operation on a production database. It requires an exclusive lock and doubles the disk space during the migration. This is why it’s so important to get the column order right at the design phase.

A Note on Readability vs. Optimization

There is a valid argument that good_layout is harder to read. We usually like to see id first, then created_at, then the important data, then the flags like is_active at the end. The optimized layout feels "scrambled."

My compromise is usually this: Use the optimized physical layout for the table, but use a View for human interaction.

CREATE TABLE orders (
    -- Optimized order for storage
    created_at timestamp,
    total_amount numeric,
    user_id bigint,
    status_id int2,
    is_shipped boolean
);

CREATE VIEW orders_v AS 
SELECT 
    user_id,
    total_amount,
    status_id,
    is_shipped,
    created_at
FROM orders;

This gives you the best of both worlds: a clean, logical interface for your developers and a tightly packed, efficient binary storage format for the engine.

Summary of the "Golden Order"

If you want to minimize padding without overthinking it, follow this template for every new table you create:

1. Primary Key / IDs (Usually bigint or uuid, 8 or 16 bytes)
2. Timestamps (timestamptz is 8 bytes)
3. Numeric/Floats (Fixed width)
4. Integers (int4 - 4 bytes)
5. Short Integers (int2 - 2 bytes)
6. Booleans / Chars (1 byte)
7. JSONB / Text / Varchar (Variable length)

By placing variable-length types at the very end, you ensure that any future ALTER TABLE ... ADD COLUMN (which adds columns to the end) won't accidentally introduce new padding gaps between your existing fixed-width columns and the new one.

It’s a rare win in the world of database performance: a change that costs nothing in terms of query complexity but pays dividends in storage efficiency and cache performance for the entire life of the application. Next time you're writing a migration, take thirty seconds to sort your columns. Your future DBA will thank you.