
The 20GB WAL File That Shouldn't Exist: How I Finally Tamed SQLite Checkpoint Starvation
A deep dive into why long-lived read transactions silently prevent SQLite from truncating its Write-Ahead Log, leading to runaway disk usage and performance degradation.
Have you ever looked at your production server's disk usage and noticed a sidecar file that is ten times larger than the actual database it’s supposed to be supporting?
It starts as a minor curiosity. You have a 2GB SQLite database, a reasonable size for a medium-scale application. But sitting right next to it is a file ending in -wal that has ballooned to 20GB. You restart the application, and the file stays. You run a VACUUM, and the file laughs at you.
This is the phenomenon known as Checkpoint Starvation. It is the silent killer of SQLite performance and the primary reason why "Lite" sometimes feels like a misnomer when your monitoring alerts start screaming about disk pressure.
The Magic and the Curse of WAL Mode
To understand why that file is 20GB, we have to talk about why we use Write-Ahead Logging (WAL) in the first place.
Back in the old days, SQLite used a "rollback journal." If you wanted to write to the database, you locked the whole thing. Readers had to wait for writers; writers had to wait for readers. It was safe, but it didn't scale for web traffic.
WAL changed the game. When you enable PRAGMA journal_mode=WAL;, SQLite stops overwriting the database file directly. Instead, it appends new changes to a separate -wal file. This is brilliant because:
1. Readers don't block writers.
2. Writers don't block readers.
3. Disk I/O is sequential, which is much faster.
But there is a catch. The WAL file is essentially a "todo list" of changes. At some point, those changes need to be folded back into the main .db file so the WAL can be cleared. This process is called checkpointing.
If the checkpointing process fails to finish, the todo list just keeps growing. And growing. And growing.
Why Checkpoints Fail: The "Oldest Reader" Problem
SQLite is incredibly protective of your data. It guarantees that a reader will always see a consistent snapshot of the database from the moment their transaction started.
Imagine this scenario:
1. Reader A starts a long-running query at 12:00 PM. It is looking at the state of the database as of "WAL Frame #100."
2. Writer B comes along and writes 5,000 new rows. The WAL file grows to Frame #5100.
3. SQLite tries to run an automatic checkpoint to move frames #1 through #5100 into the main database file.
Here is the problem: SQLite can move those frames, but it cannot truncate or reset the WAL file as long as Reader A is still active. Why? Because Reader A might still need to read Frame #101 from the WAL to maintain its consistent view of the 12:00 PM snapshot.
If Reader A stays open for three hours, and your application is writing 100MB of data per minute, your WAL file will grow by 18GB in that window. SQLite cannot "wrap around" and reuse the beginning of the file because Reader A is effectively standing in the doorway.
Identifying the Starvation
In my case, the 20GB file was caused by a "zombie" read transaction. We had a reporting dashboard that fired off a complex SELECT statement. Due to a bug in our connection pooling logic, the connection wasn't being returned to the pool, and the transaction was never explicitly closed.
You can see this happening in real-time. If you have access to the database, run this:
PRAGMA wal_checkpoint(PASSIVE);The output gives you three integers:
1. checkpointed: The number of frames successfully moved to the main DB.
2. total: The total number of frames in the WAL.
3. restarted: The number of frames that were actually "reset" for reuse.
If checkpointed is equal to total, but your file size isn't shrinking, or if checkpointed is much lower than total despite repeated attempts, you have a "pinned" reader.
Practical Code: The Starvation Trap
Let’s look at how easy it is to trigger this in Python. This script creates a writer that constantly pumps data and a reader that holds a transaction open.
import sqlite3
import time
import threading
import os
DB_NAME = "starvation_test.db"
def setup_db():
if os.path.exists(DB_NAME): os.remove(DB_NAME)
conn = sqlite3.connect(DB_NAME)
conn.execute("PRAGMA journal_mode=WAL;")
conn.execute("CREATE TABLE data (val TEXT);")
conn.close()
def long_reader():
# This simulates a slow report or a leaked connection
conn = sqlite3.connect(DB_NAME)
print("[Reader] Opening long-lived transaction...")
conn.execute("BEGIN TRANSACTION;")
conn.execute("SELECT COUNT(*) FROM data;").fetchone()
# Stay open for 30 seconds, blocking WAL truncation
time.sleep(30)
print("[Reader] Closing transaction.")
conn.rollback()
conn.close()
def fast_writer():
conn = sqlite3.connect(DB_NAME)
for i in range(1000):
conn.execute("INSERT INTO data VALUES (?);", ("x" * 1024,)) # 1KB per write
conn.commit()
if i % 100 == 0:
# Show WAL size
wal_size = os.path.getsize(DB_NAME + "-wal") / (1024 * 1024)
print(f"[Writer] WAL size: {wal_size:.2f} MB")
conn.close()
setup_db()
t1 = threading.Thread(target=long_reader)
t2 = threading.Thread(target=fast_writer)
t1.start()
time.sleep(1) # Ensure reader starts first
t2.start()
t1.join()
t2.join()If you run this, you’ll see the WAL size climb steadily. Even if the writer finishes its work, the WAL file won't shrink until that long_reader finishes its sleep and closes the connection.
How to Fix It (and prevent it from happening again)
Fixing a 20GB WAL file isn't just about deleting the file (please don't do that while the DB is open; you'll likely corrupt your data). It’s about managing your database's lifecycle.
1. The journal_size_limit Pragma
This is your first line of defense. By default, SQLite doesn't have a limit on how large the WAL can grow. You can tell SQLite to proactively truncate the WAL once it reaches a certain size.
PRAGMA journal_size_limit = 67108864; -- 64MBWhen a checkpoint happens, if the WAL is larger than this limit, SQLite will physically truncate the file. This doesn't *guarantee* the file won't grow (if a reader is still blocking it), but it ensures that as soon as the reader clears, the disk space is reclaimed.
2. Aggressive Checkpointing
By default, SQLite runs an automatic checkpoint every 1000 pages. You can trigger a more aggressive one manually.
There are three modes for wal_checkpoint:
- PASSIVE: (Default) Checkpoints as much as it can without blocking. If a reader is in the way, it just skips those frames.
- RESTART: Like PASSIVE, but it waits for readers to finish so it can start reusing the WAL from the beginning. It doesn't block new readers, but it stays active longer.
- TRUNCATE: The nuclear option. It waits for all readers to finish, moves everything, and then truncates the WAL file to zero bytes. Warning: This will block other threads/processes trying to access the DB.
In a background maintenance task, you might run:
def maintenance_task(db_path):
conn = sqlite3.connect(db_path)
# This will block until all other connections are done with their
# current transactions, then it will shrink the WAL to zero.
conn.execute("PRAGMA wal_checkpoint(TRUNCATE);")
conn.close()3. Application-Level Timeouts
Long-lived transactions are usually a bug, not a feature. If you're using a web framework like Django or Flask, ensure you aren't opening a transaction at the start of a request and keeping it open while you call a slow external API.
If you use Python's sqlite3 module, remember that it defaults to an "implicit transaction" mode that can be confusing. Using connection.execute("BEGIN") and connection.commit() explicitly is often safer than relying on the library to guess when you're done.
4. Busy Handlers and WAL Autocheckpoints
If you have a high-write volume, the default 1000-page autocheckpoint might be too infrequent. You can lower it:
PRAGMA wal_autocheckpoint = 100; -- Checkpoint every 100 pagesThis makes checkpoints happen more often, which keeps the WAL file smaller and prevents the "huge burst of I/O" that happens when a 1GB WAL finally gets merged.
The Edge Case: Read-Only Connections
One thing that tripped me up was the assumption that "Read-Only" connections were harmless. They aren't.
Even if a connection is opened with mode=ro (read-only), it still participates in the WAL coordination. A read-only connection that forgets to close its transaction will prevent a read-write connection from checkpointing the WAL.
In microservices architectures, it’s common to have a "Sidecar" container that reads the SQLite DB for metrics or logs. If that sidecar hangs, your main application's WAL file will explode. Always set a busy_timeout and ensure your readers are as ephemeral as possible.
Better Monitoring
You can't fix what you can't see. Most standard "Disk Space" alerts are too coarse to catch this before it becomes a problem. I now include the WAL size in our application metrics.
In Go, it looks something like this:
func GetWALSize(dbPath string) int64 {
info, err := os.Stat(dbPath + "-wal")
if err != nil {
return 0
}
return info.Size()
}If WALSize > (MainDBSize * 2), something is wrong. Usually, it's a developer who left a database GUI tool (like DBeaver or TablePlus) open on their local machine with an uncommitted transaction while connected to a production replica.
Final Thoughts
SQLite's WAL mode is an engineering masterpiece that allows it to punch way above its weight class. But it's not a "set it and forget it" feature for high-traffic applications.
If you're seeing runaway disk usage:
1. Check for unclosed transactions (the usual suspect).
2. Set a journal_size_limit.
3. Use PRAGMA wal_checkpoint(PASSIVE) to diagnose how many frames are stuck.
4. Ensure your background tasks aren't holding the database hostage.
The 20GB WAL file wasn't a bug in SQLite; it was a bug in how we respected the checkpoint. Once we tightened our connection lifecycles and set a 64MB size limit, the "starvation" ended. Our disk usage stabilized, and surprisingly, the application got faster because it was no longer wading through a massive WAL index just to perform a simple read.


