loke.dev
Header image for A Subtle Reclamation of the Empty Page

A Subtle Reclamation of the Empty Page

Why deleting your data doesn't actually shrink your database file, and the performance trade-offs of reclaiming that 'ghost' space.

· 8 min read

I remember the first time I built a small logging utility using SQLite. I had accidentally let a debug loop run overnight, bloating the database to a chunky 4GB. After a quick DELETE FROM logs WHERE level = 'DEBUG';, I checked the file size, expecting it to snap back to a few megabytes. It didn't. The file stayed at 4GB. I thought I’d corrupted the header or that the OS was lying to me. It turns out, SQLite was just being efficient in a way that felt remarkably counter-intuitive.

The realization that DELETE is not TRUNCATE is a rite of passage for anyone working with embedded databases. SQLite doesn't give space back to the operating system just because you asked it to forget some data. It holds onto that space like a hoarder, waiting for the next time you have something to store.

The Physics of the Page

To understand why your database file is "lying" to you about its size, you have to look at how SQLite views the world. To the OS, a database is just a file—a stream of bytes. To SQLite, that file is a grid of fixed-size blocks called pages.

When you create a table, SQLite organizes your data into B-Trees. Each node of that B-Tree lives on a page (usually 4KB by default). When you delete a row, SQLite doesn't shift every subsequent byte in the file forward to fill the gap. That would be an architectural nightmare involving massive I/O overhead. Instead, it simply marks the space within that page as "unallocated" or moves the entire page to a specialized structure called the Free List.

The Free List is essentially a bucket of "ghost space." When you perform a subsequent INSERT, SQLite checks the Free List before asking the OS for more disk space. It’s a recycling program.

Here is a quick way to see this in action using Python. We'll create a massive table, delete it, and watch the file size remain stagnant.

import sqlite3
import os

db_name = "ghost_space.db"

def get_size():
    return os.path.getsize(db_name) / (1024 * 1024)

# 1. Create a DB and fill it with junk
conn = sqlite3.connect(db_name)
c = conn.cursor()
c.execute("CREATE TABLE big_data (content TEXT)")
c.execute("INSERT INTO big_data SELECT hex(randomblob(1000)) FROM (SELECT 1 UNION SELECT 2) t1, (SELECT 1 FROM (SELECT 1) LIMIT 50000)")
conn.commit()

print(f"Size after insert: {get_size():.2f} MB")

# 2. Delete the data
c.execute("DELETE FROM big_data")
conn.commit()

print(f"Size after delete: {get_size():.2f} MB")

# 3. Check the free list count
free_pages = c.execute("PRAGMA freelist_count").fetchone()[0]
page_size = c.execute("PRAGMA page_size").fetchone()[0]
ghost_bytes = (free_pages * page_size) / (1024 * 1024)

print(f"Free pages: {free_pages}")
print(f"Reclaimable 'Ghost' Space: {ghost_bytes:.2f} MB")

conn.close()

When you run this, you’ll see the file size stay exactly the same after the DELETE, but the freelist_count will skyrocket.

The Vacuum: Forced Reclamation

If you absolutely need that space back—perhaps because you're running on an embedded device with limited flash storage—you have to use the VACUUM command.

VACUUM is not a simple "cleanup" operation. It is a full-blown reconstruction. SQLite creates a new, temporary database file, copies all the *active* data from the old file into the new one (compacting it in the process), and then replaces the old file with the new one.

-- Reclaim all possible space
VACUUM;

While this sounds like a win-win, it comes with heavy costs:

1. The 2X Problem: Because VACUUM creates a copy, you need enough free disk space to hold a second copy of your database. If you have a 10GB database and only 2GB of free disk space, VACUUM will fail.
2. I/O Intensity: You are essentially re-writing your entire database. This is a heavy operation that locks the database for the duration of the process.
3. Invalidated Rowids: If you haven't defined an explicit INTEGER PRIMARY KEY, SQLite uses internal rowids. VACUUM can change these, which might break things if you’ve stored those IDs elsewhere.

Automation via Auto-Vacuum

SQLite provides a middle ground: PRAGMA auto_vacuum. This isn't a "set it and forget it" performance booster; it’s a structural decision you usually have to make when the database is first created.

There are three modes:
- NONE (0): The default. Space is only reclaimed via manual VACUUM.
- FULL (1): On every commit, the database checks if pages can be moved from the end of the file to the free list, effectively shrinking the file.
- INCREMENTAL (2): Pages are moved to the free list, but the file doesn't shrink until you manually trigger it via PRAGMA incremental_vacuum.

To enable it on a new database:

PRAGMA auto_vacuum = FULL;
CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT);

The Catch: auto_vacuum doesn't come for free. To make it work, SQLite has to store extra "pointer map" pages to track which child pages belong to which parents. This adds about 1% overhead to the file size and slightly slows down writes. Furthermore, auto_vacuum only moves pages to the end of the file so they can be truncated; it doesn't defragment the internal pages like a full VACUUM does.

Why You Might Actually Want the Ghost Space

There’s a reason auto_vacuum is off by default. In many production scenarios, reclaiming space is actually a performance anti-pattern.

Imagine a high-frequency logging system. You delete logs older than 30 days every night. If you VACUUM every night, you are shrinking the file, only for the OS to have to re-allocate blocks on the disk the next morning as new logs come in.

File system allocation is expensive. It involves syscalls, updating file system metadata, and potentially fragmenting the file on the physical platter or SSD. By keeping the "ghost space" inside the SQLite file, you are essentially pre-allocating space. The next INSERT will be faster because SQLite already owns the bytes it needs.

Fragmentation: The Hidden Tax

There is another type of "empty" space that isn't captured by the Free List count: internal fragmentation.

Even if a page is in use, it might only be 50% full. This happens frequently with B-Trees when you perform random inserts and deletes. A page splits, leaving two pages half-full. SQLite won't necessarily merge them back together immediately.

If your database has high internal fragmentation, your cache becomes less efficient. You end up loading 4KB of data into memory just to read 2KB of actual information. This is where a periodic VACUUM (or the newer VACUUM INTO) becomes a maintenance task rather than a disk-space-saving hack.

-- SQLite 3.27.0+ allows vacuuming into a separate file.
-- This is great for backups that are also compacted.
VACUUM INTO 'compact_backup.db';

Analyzing the Damage

Before you decide to reclaim space, you should know if it's worth it. SQLite provides a tool (often a separate binary) called sqlite3_analyzer. If you run it against your database, it generates a detailed report on where the "slack" is.

# Run this in your terminal
sqlite3_analyzer my_database.db

The output will tell you:
- Percentage of the file that is the Free List.
- The average "payload" of your pages.
- How much space is wasted by fragmentation.

If your "Fragmented Space" is under 10% and your disk isn't screaming for room, leave it alone.

The WAL Mode Factor

If you are using Write-Ahead Logging (PRAGMA journal_mode=WAL;), the file size issue gets even more complex. In WAL mode, writes are appended to a separate -wal file. The main .db file doesn't even see the new data until a "checkpoint" occurs.

If you see your database size ballooning despite VACUUM, check your WAL file. Long-running read transactions can prevent checkpoints from finishing, causing the WAL file to grow indefinitely.

-- Force a checkpoint to merge WAL data into the main DB file
PRAGMA wal_checkpoint(TRUNCATE);

Strategic Reclamation

So, how should you handle the empty page? Here is the philosophy I've landed on after years of managing SQLite-backed applications:

1. Don't Fear the Ghost: In most server-side or desktop applications, a few hundred megabytes of unallocated space is a feature, not a bug. It’s your "performance buffer" for future writes.
2. Vacuum on Major Events: If you perform a one-time migration that deletes 50% of your data, that is the time to run a manual VACUUM.
3. Use `VACUUM INTO` for Backups: Instead of backing up a bloated, fragmented file, use VACUUM INTO to create a pristine, compacted copy as your backup. You get a smaller backup file and a "free" defrag.
4. Avoid `auto_vacuum` on SSDs: The constant moving of pages and pointer map updates can increase write amplification. Unless you are on a very space-constrained system (like an IoT sensor), the trade-off usually isn't worth it.
5. Page Size Matters: If you’re storing large BLOBs, increasing your page_size to 8KB or 16KB (before the DB is created) can reduce fragmentation and make the "empty page" problem less pronounced.

Practical Code: The Maintenance Script

If you're building an app and want to be a good citizen, don't VACUUM on every close. Instead, check the "waste ratio" and only clean up when it passes a threshold.

def maintenance_check(db_path, waste_threshold=0.2):
    """
    Vacuums the database if the free list exceeds a percentage of the total size.
    """
    conn = sqlite3.connect(db_path)
    c = conn.cursor()
    
    free_pages = c.execute("PRAGMA freelist_count").fetchone()[0]
    total_pages = c.execute("PRAGMA page_count").fetchone()[0]
    
    if total_pages > 0:
        ratio = free_pages / total_pages
        if ratio > waste_threshold:
            print(f"Waste ratio {ratio:.2%} exceeded threshold. Vacuuming...")
            c.execute("VACUUM")
            conn.commit()
        else:
            print(f"Waste ratio {ratio:.2%} is acceptable.")
    
    conn.close()

Wrapping Up

Space in a database is a fluid concept. The du command tells you how much the OS has surrendered to the database engine, but only the database engine knows how much of that is actually "alive."

Reclaiming the empty page is a subtle art. It’s a balance between being a good neighbor to the rest of the file system and providing a high-performance, low-latency environment for your data. In most cases, the best thing you can do for your database is to let it keep its "ghosts"—they are much faster to overwrite than the cold, hard vacuum of unallocated disk space.