
Stop Adding `NOT NULL` Columns: How a Single Migration Silently Kills Your Postgres Availability
Adding a non-nullable column to a massive table seems trivial, but the hidden Access Exclusive lock can paralyze your production database for minutes.
Have you ever triggered a "simple" database migration in staging, saw it finish in 50ms, and then watched in horror as your production site ground to a complete halt for five minutes?
It’s a rite of passage for many backend engineers. You're adding a NOT NULL column to a table with ten million rows. On your local machine, it’s instant. On production, it’s a site-wide outage. Postgres is usually your best friend, but when it comes to schema changes on massive tables, it can be a bit of a jealous lover—it wants exclusive attention, and it will block everyone else until it gets it.
The "Lock of Death"
When you run an ALTER TABLE command to add a column, Postgres needs an AccessExclusiveLock. This is the heaviest lock in the Postgres arsenal. It doesn't just stop other people from changing the schema; it stops everything.
While that migration is running, no one can SELECT, INSERT, UPDATE, or DELETE from that table. They just sit there, hanging, waiting for the lock to release.
But wait, it gets worse. Postgres uses a first-in, first-out queue for locks. If your migration is waiting for a long-running SELECT query to finish so it can grab that AccessExclusiveLock, every other query that arrives *after* your migration will also be stuck in line. You’ve effectively paralyzed your database before the migration even starts.
The Postgres 11 "Fixed It" Myth
If you're using Postgres 11 or newer, you might think you're safe. After all, PG 11 introduced a feature where adding a column with a DEFAULT value no longer requires rewriting the entire table.
-- This is fast in Postgres 11+
ALTER TABLE users ADD COLUMN logic_tier text DEFAULT 'free';This is great, but it doesn't solve the "Queue of Death" problem. Even if the migration itself takes 10ms, if it gets stuck behind a 30-second analytical query, your entire web app is still down for 30 seconds.
And if you try to add a NOT NULL column *without* a default to a table that already has data? Postgres will throw an error anyway because it doesn't know what to put in those rows.
The Zero-Downtime Playbook
If you want to keep your site alive while modifying massive tables, you have to stop thinking of "adding a column" as a single step. It’s a dance. Here is the safest way to do it.
1. The "Insurance Policy" (Set a Timeout)
Never run a migration on a production database without a lock timeout. This prevents your migration from sitting in the queue for too long and blocking everyone else.
-- If I can't get the lock in 2 seconds, abort the migration
SET lock_timeout = '2s';
ALTER TABLE orders ADD COLUMN internal_note text;2. Add the Column as Nullable
Instead of forcing NOT NULL immediately, add the column as nullable. This happens almost instantly (metadata change only).
-- Step 1: Add it as nullable
ALTER TABLE users ADD COLUMN phone_number text;3. Backfill Data in Batches
If you need a default value or need to move data into this new column, do not run a single massive UPDATE statement. That will bloat your table and lock rows for ages. Do it in chunks.
-- Step 2: Backfill (pseudo-code/script logic)
-- UPDATE users SET phone_number = 'N/A' WHERE id BETWEEN 1 AND 10000;
-- Sleep for 100ms
-- UPDATE users SET phone_number = 'N/A' WHERE id BETWEEN 10001 AND 20000;4. Add the Constraint as NOT VALID
Now we want to enforce the NOT NULL rule. But a standard ALTER TABLE ... SET NOT NULL requires a full table scan to verify that no nulls exist. On a 100GB table, that's a long time to hold an exclusive lock.
Instead, we use a check constraint.
-- Step 3: Add the constraint, but tell Postgres not to check existing rows yet
ALTER TABLE users
ADD CONSTRAINT phone_not_null
CHECK (phone_number IS NOT NULL) NOT VALID;The NOT VALID flag is the magic ingredient. Postgres will enforce the rule for all *new* or *updated* rows, but it won't check the old ones yet. The lock is released immediately.
5. The Final Validation
Finally, we tell Postgres to validate the constraint. This still requires a full table scan, but here's the kicker: it only requires a ShareUpdateExclusiveLock. This lock does not block reads or writes!
-- Step 4: Validate it without blocking the world
ALTER TABLE users VALIDATE CONSTRAINT phone_not_null;Once this finishes, you can safely "real" NOT NULL the column if you want, or just leave the check constraint as is (it's functionally equivalent for most use cases).
The "Gotcha" with Indexes
If you're adding a NOT NULL column because you plan to index it, remember that CREATE INDEX also locks the table. Always use CREATE INDEX CONCURRENTLY. It takes longer to build, but it doesn't kick your users off the site.
-- Don't do this:
-- CREATE INDEX idx_user_phone ON users(phone_number);
-- Do this:
CREATE INDEX CONCURRENTLY idx_user_phone ON users(phone_number);Wrapping Up
Database migrations are one of the few places where "moving fast and breaking things" actually means "everyone has to stop working because the API is 504ing."
Next time you're about to ship a NOT NULL column:
1. Check your table size. If it’s small, you’re fine. If it’s millions of rows, be careful.
2. Use `lock_timeout`.
3. Add nullable -> Backfill -> Validate.
It’s more typing, sure. But it’s a lot less stressful than explaining to your boss why the database spent ten minutes ignoring customers to check a bunch of empty columns.


