loke.dev
Header image for Randomness Is a Database Anti-Pattern: The Engineering Case for Sequential UUIDs

Randomness Is a Database Anti-Pattern: The Engineering Case for Sequential UUIDs

A deep dive into B-Tree fragmentation and why the move from random UUIDv4 to sequential UUIDv7 is the highest-ROI database optimization you can make.

· 8 min read

I remember watching a Postgres production dashboard where the INSERT latency looked like a jagged mountain range. Every few minutes, a spike would hit, latency would triple, and our disk I/O would peg at 100% before settling back down to a "normal" that was slowly getting worse every week. We were using random UUIDv4s for everything—after all, that’s what the cool kids in distributed systems were doing—but our database was dying under the weight of its own indexes.

The culprit wasn't our query logic or a lack of RAM. It was randomness. Specifically, the way random data interacts with the most fundamental data structure in the database world: the B-Tree.

The Lie of "Scale-Out" Randomness

We are taught that UUIDs are the solution to the problems of auto-incrementing integers. And in a vacuum, that's true. If you have five microservices generating IDs simultaneously, you can't have them all asking a single SQL sequence for the "next" number. You need decentralization. UUIDv4, which is essentially 122 bits of pure, unadulterated entropy, gives you that. You can generate a trillion of them on a disconnected laptop in the middle of the woods and be reasonably sure you won't have a collision.

But while UUIDv4 is great for the *application*, it is a nightmare for the *storage engine*.

Most relational databases (PostgreSQL, MySQL, SQL Server) use B-Trees for their primary keys. B-Trees are designed to keep data sorted so that lookups are fast ($O(\log n)$). When you use an auto-incrementing integer, you are always appending to the "right" side of the tree. The database fills up a page of data, seals it, and moves to the next one. It’s clean, it’s orderly, and it’s fast.

When you use a random UUID, you are essentially telling the database to insert a record into a random location in a massive, sorted list.

The Anatomy of a Page Split

To understand why this matters, we have to look at how data actually sits on the disk. Databases don't write records one by one; they write them in "pages" (usually 8KB or 16KB).

Imagine a B-Tree index that is already 50GB. It’s too big to fit in RAM (the Buffer Pool), so most of it lives on the NVMe drive. When you insert a random UUID, the database has to:

1. Find the page where that specific UUID *should* live to maintain the sorted order.
2. Pull that page from disk into RAM.
3. Insert the record.
4. Write the page back to disk.

But what happens when that 8KB page is already full? This is where the Page Split happens. The database has to create a new page, move half of the records from the old page to the new one, and then insert your new record.

Because your IDs are random, these splits happen everywhere. You aren't just adding to the end; you are constantly shattering existing pages. This leads to two catastrophic problems:

1. Index Fragmentation: Your pages end up being only 50-70% full on average. You are literally wasting 30% of your expensive SSD space and RAM just to store "emptiness" created by splits.
2. I/O Thrashing: Since the next UUID could go *anywhere* in the 50GB index, the database is constantly evicting "hot" pages from RAM to make room for a page it needs to update. Your cache hit ratio plummets.

Enter UUIDv7: The Best of Both Worlds

The industry finally realized that we were shooting ourselves in the foot. The solution isn't to go back to integers (which suck for distributed systems) but to make UUIDs "lexicographically sortable."

The IETF RFC 9562 introduced UUIDv7. It replaces the randomness at the start of the ID with a Unix timestamp (in milliseconds).

A UUIDv7 looks like this:
018c3b7b-231a-7b3d-bf22-f9435b6f3c09

The first 48 bits are the timestamp. This means that every UUIDv7 generated "now" will be greater than any UUIDv7 generated "one millisecond ago." To the B-Tree, this looks almost exactly like an auto-incrementing integer. Inserts happen at the end of the index. Page splits become rare. The database stays happy.

What does the data look like?

Let's look at the binary layout of a UUIDv7 compared to v4:

UUIDv4:
[ 122 bits of randomness ] 
Result: Total entropy. No order.

UUIDv7:
[ 48-bit Timestamp ] [ 12 bits of "ver/rand" ] [ 62 bits of randomness ]
Result: Time-ordered, but still unique.

Implementing UUIDv7 in the Real World

If you're using a modern language, you don't need to wait for your database to "support" UUIDv7. A UUID is just a 16-byte (128-bit) buffer. You can generate the bytes in your application and send them to the database as a standard UUID type.

Example: Python Implementation

While there are libraries like uuid6, here is a simplified look at how a v7 is constructed:

import time
import os
import secrets

def generate_uuidv7():
    # 48-bit timestamp (milliseconds since epoch)
    ms = int(time.time() * 1000)
    
    # 10 bytes of randomness
    rand = secrets.token_bytes(10)
    
    # Construct the uuid bytes
    # This is a simplification; you need to set the version (7) 
    # and variant bits correctly.
    uuid_bytes = ms.to_bytes(6, 'big') + rand
    
    # Set version to 7 (bits 48-51)
    # Set variant to 2 (bits 64-65)
    # ... (bit manipulation logic) ...
    
    return uuid_bytes

In a production environment, you should use a battle-tested library. For Python, uuid7 is the go-to:

from uuid_extensions import uuid7

# Generate a sortable UUID
user_id = uuid7()
print(user_id)
# Output: 018c3b8a-1a2b-7c3d-8e4f-506162636465

Example: Node.js (TypeScript)

In the Node ecosystem, uuidv7 is the standard package.

import { v7 as uuidv7 } from 'uuid';

const orderId = uuidv7();
console.log(orderId); 
// Every new ID will be greater than the last

// Persistence in TypeORM or Prisma:
await db.order.create({
  data: {
    id: uuidv7(),
    amount: 100.00
  }
});

The Performance Impact: A Hypothetical Benchmark

Imagine we are inserting 10 million rows into a PostgreSQL table with a primary key index.

| Metric | UUIDv4 (Random) | UUIDv7 (Sequential) |
| :--- | :--- | :--- |
| Insert Rate (avg) | 8,000 req/s | 25,000 req/s |
| Index Size | 640 MB | 410 MB |
| Disk Write IOPS | High (random writes) | Low (sequential writes) |
| Buffer Pool Efficiency | Low (random pages) | High (only "tail" pages) |

Why is the index size smaller? Because of the Fill Factor. When you insert sequentially, Postgres can fill each index page nearly to 100%. With random inserts, due to constant splitting and re-balancing, pages often stay around 60-70% full. You are paying for 30% more storage and memory than you actually need.

Migrating from v4 to v7

You might be thinking: *"I already have 500 million rows with UUIDv4. Am I stuck?"*

The beauty of UUIDs is that they are just bytes. UUIDv7 is backward compatible with the UUID data type in Postgres, MySQL, and SQL Server. You can start writing UUIDv7s into the same column today.

However, there is a catch: The Hybrid Index Problem.

If you start inserting v7s into an index full of v4s, the v7s will all cluster together at the "end" of the index (since their timestamps make them "larger" than most v4s). This actually helps! Your new inserts will become efficient, but your old data remains fragmented.

To truly fix the fragmentation, you would eventually need to rewrite the table:

-- In PostgreSQL, this is a heavy operation (locks the table)
-- But it will re-sort everything and compact the pages
CLUSTER my_table USING my_table_pkey;

The "Security" Argument (and why it's usually wrong)

A common pushback against sequential IDs is security. "If my IDs are sequential, a competitor can scrape my site by just incrementing the ID!" or "People can tell how many orders I have by looking at the timestamp in the ID!"

Let's break that down:

1. Insecurity through enumeration: This applies to integers (1, 2, 3...). UUIDv7 is sortable, but it is not predictable. The last 62 bits are still random. You cannot guess the "next" UUIDv7 because there are trillions of possibilities within a single millisecond.
2. Information Leakage: Yes, a UUIDv7 leaks the *creation time* of the record. If your business model relies on the creation time of a user account being a state secret, don't use UUIDv7. But for 99% of us, created_at is a public or semi-public field anyway.

If you *really* need to hide the timestamp but want the performance, you can use something like TSID or ULID, though UUIDv7 is now the official standard.

Database-Specific Gotchas

PostgreSQL

Postgres doesn't have a built-in uuid_generate_v7() function yet (as of v16), though it's being discussed for v17. For now, you can use a simple PL/pgSQL function or generate them in your app.

Actually, generating them in the app is better. It moves the CPU overhead of ID generation from your single database primary to your many, horizontally-scaled application servers.

MySQL (InnoDB)

MySQL users benefit the *most* from UUIDv7. InnoDB uses a Clustered Index, meaning the actual table data is stored in the primary key B-Tree. Random UUIDv4s in MySQL don't just fragment the index; they fragment the entire table. Moving to UUIDv7 in MySQL often results in a 2x-5x performance gain for write-heavy workloads.

The Engineering Verdict

We often look for complex ways to optimize databases: sharding, read replicas, caching layers, tuning work_mem. But sometimes, the biggest win is simply stopping the database from doing unnecessary work.

Randomness is an anti-pattern for B-Trees. By switching to UUIDv7, you align your application's needs (distributed uniqueness) with the database's strengths (sequential writes). It is quite literally the highest ROI change you can make to a growing system.

Stop using UUIDv4 for primary keys. Your SSDs, your latency metrics, and your future self will thank you.

Summary for the Scanners:

- The Problem: UUIDv4 is random. B-Trees (database indexes) hate randomness. It causes page splits, disk thrashing, and bloated indexes.
- The Solution: UUIDv7. It’s a 128-bit UUID with a 48-bit timestamp at the start.
- The Benefit: It sorts like an integer but stays unique like a UUID. You get faster inserts and better memory usage.
- The Tradeoff: It leaks the creation time of the ID. That's usually it.