The 24-Bit Guess: How I Finally Reconciled Redis LRU With My Mystery Cache Misses

Most developers believe that when they set maxmemory-policy allkeys-lru in Redis, the engine maintains a perfect, sorted list of every key in the database, meticulously moving items to the front every time they are touched. This is a lie. If Redis actually did this, the performance overhead of every GET request would skyrocket because every read would necessitate a write operation to update the linked list pointers.

Instead, Redis uses a probabilistic approximation that relies on a tiny 24-bit field and a bit of "good enough" randomness.

I learned this the hard way when a production cluster started evicting "hot" session keys while leaving dormant, weeks-old metadata keys untouched. On paper, it shouldn't have happened. In practice, we were falling victim to the statistical variance of the 24-bit guess.

The Architecture of a "Good Enough" Guess

To understand why your cache might be lying to you, you have to look at the redisObject definition in the C source code. Every value stored in Redis is wrapped in this structure:

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least frequently used). */
    int refcount;
    void *ptr;
} robj;

Notice unsigned lru:LRU_BITS. In modern Redis, LRU_BITS is 24. That is all the space Redis allocates to track the "age" of a key.

When you access a key, Redis updates this 24-bit field with the current system time (down to a resolution of seconds or milliseconds depending on the version). It doesn't move the key in a list; it just stamps it with a number.

The Eviction Loop

When Redis hits its maxmemory limit, it doesn't scan the whole database to find the absolute oldest key. That would be an $O(N)$ operation that would block the event loop for seconds. Instead, the algorithm follows this logic:

1. Pick $N$ keys at random (where $N$ is defined by maxmemory-samples).
2. Look at the 24-bit LRU timestamps of those $N$ keys.
3. Evict the key with the oldest timestamp.
4. If memory is still over the limit, repeat.

This is why your "old" keys might persist while "newer" keys vanish. If your oldest key isn't lucky enough to be picked in the random sample, it survives.

Simulating the "Mystery Miss"

To see this in action, I wrote a script to flood a small Redis instance with a maxmemory of 10MB. We can watch how the eviction behavior changes based on how many samples Redis takes.

import redis
import time
import random

# Connect to a local test instance
r = redis.Redis(host='localhost', port=6379, db=0)

# Configure Redis for a small memory footprint and LRU
r.config_set('maxmemory', '10mb')
r.config_set('maxmemory-policy', 'allkeys-lru')
r.config_set('maxmemory-samples', '5') # The default

def populate_cache(num_keys):
    print(f"Inserting {num_keys} keys...")
    for i in range(num_keys):
        # We'll make the keys roughly 1KB each
        r.set(f"key:{i}", "x" * 1000)
        if i % 1000 == 0:
            print(f"Inserted {i}")

def check_survivors(num_keys):
    found = 0
    for i in range(num_keys):
        if r.exists(f"key:{i}"):
            found += 1
    return found

# 1. Fill it up
populate_cache(15000) # This should exceed 10MB

# 2. Wait a bit, then access the first 100 keys to make them "Recent"
print("Touching the first 100 keys...")
for i in range(100):
    r.get(f"key:{i}")

# 3. Flood it with more keys to force evictions
print("Flooding with more data...")
for i in range(20000, 25000):
    r.set(f"key:{i}", "y" * 1000)

# 4. See if our "Hot" keys (0-100) survived
survived = []
for i in range(100):
    if r.exists(f"key:{i}"):
        survived.append(i)

print(f"Hot keys remaining: {len(survived)}/100")

If you run this with maxmemory-samples 3, you might find that several of your "recently accessed" keys were evicted. If you bump it to 10, the accuracy improves significantly, but at a slight CPU cost.

The 24-Bit Overflow and the Idle Clock

The 24-bit field creates an interesting technical constraint. If you track time in seconds, 24 bits only gives you about 194 days before the counter wraps around to zero.

Redis handles this by using a "virtual" clock. The server.lruclock is an integer that is updated by a cron job every millisecond. When Redis calculates the "idletime" of a key, it does something like this:

unsigned long long estimateObjectIdleTime(robj *o) {
    unsigned long long lruclock = getLRUClock();
    if (lruclock >= o->lru) {
        return (lruclock - o->lru) * LRU_CLOCK_RESOLUTION;
    } else {
        // Handle the wrap-around case
        return (lruclock + (LRU_CLOCK_MAX - o->lru)) * LRU_CLOCK_RESOLUTION;
    }
}

This "resolution" is usually 1 second in standard configurations. This means that two keys accessed within the same second are effectively the same age to Redis. If you have a high-throughput system where milliseconds matter for eviction priority, Redis LRU is too blunt an instrument.

Why My Sessions Were Dying

In my case, the problem wasn't just the sampling—it was the Idle Time.

We had two types of data:
1. Sessions: Small, frequently accessed, accessed within the last 5 minutes.
2. Report Cache: Large (200KB+), accessed once an hour, but there were *thousands* of them.

Because the Report Cache was so much larger and there were so many more of those keys, the random sampling was statistically more likely to pick a "hot" session key than a "cold" report key.

Even though the session key had an idletime of 10 seconds and the report key had an idletime of 3000 seconds, if the random sample of 5 keys contained 4 new keys and 1 session key (and zero report keys), the session key was toast.

The Solution: Tuning the Sample Size

The fix for us was a combination of increasing the sample size and moving to a more granular eviction policy.

# Increase the accuracy of the LRU approximation
CONFIG SET maxmemory-samples 10

By increasing the sample from 5 to 10, the approximation becomes nearly indistinguishable from true LRU, but it costs about 1-2% more CPU during eviction bursts. For most modern servers, that’s a trade-off you should make every day.

The Evolution: The Eviction Pool

It’s worth noting that since Redis 3.0, the "guess" isn't as blind as it used to be. Redis maintains a Pool of Eviction Candidates.

When the algorithm samples $N$ keys, it doesn't just pick one and throw the rest away. It populates a sorted pool of "good" eviction candidates. It tries to keep the "best" (oldest) keys in this pool. In the next iteration, if it finds a key that is older than the ones in the pool, it swaps them.

This essentially gives Redis "memory" of previous samples. This is why the 24-bit guess works as well as it does today. Even with a sample size of 5, the pool makes the behavior much closer to a global LRU than a pure random sample would allow.

When LRU is the Wrong Choice (The LFU Pivot)

Sometimes, the "guess" fails because LRU itself is the wrong strategy. If you have a key that is accessed once every hour but must stay in cache, and a key that was accessed 100 times in the last minute but will never be accessed again, LRU will keep the latter and kill the former.

Redis 4.0 introduced LFU (Least Frequently Used). Interestingly, it recycles those same 24 bits in the redisObject.

- 16 bits are used to store the last increment time (to decay the counter).
- 8 bits are used for a logarithmic counter of accesses.

If your cache misses feel "wrong," you might want to try:

CONFIG SET maxmemory-policy allkeys-lfu

Using 8 bits to count frequency seems impossible (max value 255), but Redis uses a logarithmic counter. An 8-bit counter with a lfu-log-factor of 10 can represent up to 1 million accesses.

Monitoring Your Evictions

Don't guess; look at the telemetry. Use redis-cli to check how much work your eviction loop is doing:

redis-cli info stats | grep evicted_keys

If evicted_keys is climbing rapidly alongside your cache_misses, but your total_commands_processed is steady, your 24-bit guess is likely failing you.

Another indispensable tool is OBJECT IDLETIME. If you suspect a key should have been evicted but wasn't, check its internal clock:

# Returns the seconds since the key was last touched
redis-cli OBJECT IDLETIME my_mystery_key

If you see a key with an idletime of 86400 (one day) while your memory is full and you're getting misses on new keys, your maxmemory-samples is too low.

Practical Takeaways for Your Config

1. Default isn't always best: maxmemory-samples 5 is designed for 2015-era hardware. If you're on a modern cloud instance, set it to 10.
2. Beware of "Allkeys": If you have a mix of volatile keys (with TTLs) and permanent keys, use volatile-lru. This limits the "24-bit guess" to keys you've actually marked as disposable.
3. Monitor the resolution: Remember that the LRU clock typically only has 1-second precision. If your application logic depends on sub-second eviction priority, you need to handle that at the application layer, not the database layer.
4. Use LFU for uneven workloads: If you have "long-tail" data that is rarely but consistently accessed, LRU will fail you. LFU's 24-bit split is often much more effective for protecting valuable but infrequently accessed data.

Redis is a masterpiece of engineering because it chooses "fast and slightly wrong" over "slow and perfectly right." The 24-bit guess is a perfect example of that philosophy. Once you stop expecting it to be a perfect linked list, you can finally start tuning it for the reality of your data.