loke.dev
Header image for 3 Hardware-Level Invariants of the Write Barrier: Why Your Garbage Collector Is Secretly Stealing Your CPU Cycles

3 Hardware-Level Invariants of the Write Barrier: Why Your Garbage Collector Is Secretly Stealing Your CPU Cycles

Every time you update an object property, your runtime might be injecting hidden assembly instructions to keep the garbage collector from losing track of the heap.

· 8 min read

I was looking at the assembly output for a simple JavaScript property assignment the other day, and I realized we’ve been lied to about the cost of a store instruction.

We’re taught that obj.x = y is a fundamental, almost atomic operation. In a low-level language like C, it usually is—a single STR or MOV instruction that shoves a value into a memory address. But in a high-performance managed runtime like V8, that single line of code is a liar. It’s actually a gateway to a complex, hidden ritual called the Write Barrier.

If you’ve ever wondered why your Node.js or Chrome process hits a performance ceiling despite your logic being "O(n)", the culprit is often the invisible assembly injected by the compiler to keep the Garbage Collector (GC) from losing its mind. These barriers are the tax we pay for memory safety, and they are governed by three hardware-level invariants that are constantly stealing your CPU cycles.

The Hidden Tax of the Assignment Operator

Before we dive into the invariants, let's look at what actually happens under the hood. When you execute this in JavaScript:

function updateUser(user, profile) {
  user.activeProfile = profile;
}

You might expect the generated machine code to look like this:

; Idealized (but wrong) assembly
MOV [RAX + 24], RBX  ; Store profile pointer into user object
RET

In reality, V8’s TurboFan or Maglev compilers generate something much more bloated. It looks more like this:

; Simplified V8-style Write Barrier
MOV [RAX + 24], RBX      ; The actual store
; --- START OF WRITE BARRIER ---
TEST RBX, 1              ; Is the value a Smi (small integer)?
JZ end                   ; If it's an integer, we don't care about GC
CMP [RBX - 1], 0x4000    ; Is the value in the "Young Generation"?
JNE end
CMP [RAX - 1], 0x8000    ; Is the 'user' object in the "Old Generation"?
JNE end
CALL RecordWrite         ; THE EXPENSIVE PART: Log this cross-gen pointer
end:
RET

That "expensive part" is the write barrier. It’s a snippet of code that runs every single time you store an object reference into another object. It’s there to protect three specific invariants.

---

1. The Generational Invariant: The "Old-to-Young" Problem

Most modern GCs are generational. They split the heap into a Young Generation (where new objects live and die fast) and an Old Generation (where long-lived objects reside).

The core rule of a generational GC is that we want to perform "Minor GCs" (Scavenges) only on the Young Generation. To do this, we need to know every pointer that points *into* the Young Generation. Some of those pointers come from the stack, but some come from objects in the Old Generation.

The Invariant

*An Old Generation object must never point to a Young Generation object without being recorded in a "Remembered Set".*

If we didn't have this invariant, the GC would have to scan the entire 2GB+ Old Generation heap just to see if anything is still using a 16MB Young Generation object. That would turn a 2ms pause into a 200ms pause.

The Hardware Reality: Card Marking

To keep track of these pointers, V8 uses Card Marking. It treats the entire heap as a giant deck of cards (usually 512 bytes each). When you write a pointer, the write barrier calculates which "card" the host object lives on and marks it as "dirty" in a separate bitmap.

// A simplified conceptual C++ version of a Write Barrier in a runtime
void WriteBarrier(HeapObject* host, Object* value) {
  if (value->IsHeapObject() && host->IsOldGeneration() && value->IsYoungGeneration()) {
      // Hardware-level bit manipulation to mark the 'card'
      size_t card_index = (reinterpret_cast<uintptr_t>(host)) >> 9; // 512-byte cards
      RememberedSet::DirtyCardTable[card_index] = 1;
  }
}

Why this steals your cycles:
This isn't just a few extra instructions. It’s a cache killer. Every time you update a property, you aren't just writing to the object; you're writing to the "card table" in a completely different area of memory. This causes cache line contention. If you're updating objects in a loop, you're constantly bouncing between the heap data and the card table metadata.

---

2. The Tri-color Invariant: Concurrent Marking

The days of "Stop-the-World" GCs are mostly over. V8 uses a concurrent marker that walks the object graph while your code is still running. To do this safely, it uses a Tri-color marking scheme:

* White: Not yet visited by the GC.
* Grey: Visited, but its children haven't been scanned.
* Black: Visited and all children have been scanned.

The Invariant

*A Black object must never point to a White object.*

If your code (the "mutator") creates a link from a Black object to a White object and then deletes the only other pointer to that White object, the GC will assume the White object is garbage and delete it. Congratulations, you now have a use-after-free bug in your high-level language.

The Hardware Reality: The "Dijkstra" Barrier

To prevent this, the write barrier acts as a "shifter." If you try to store a White object into a Black object, the barrier intercepts the write and immediately turns the White object Grey (marking it for scanning).

// The Mutator's hidden logic
function onPropertyWrite(target, value) {
  if (GC.isMarking() && isBlack(target) && isWhite(value)) {
    markGrey(value); // Force the GC to notice this object
  }
}

The Cost of "Concurrent" Marking:
Even when the GC isn't running a full cycle, these checks often remain active. The CPU’s branch predictor usually gets very good at guessing that isMarking() is false. However, when the GC *is* active, the branch predictor starts failing, and the "instruction pipeline" stalls. You aren't just paying for the instructions; you're paying for the pipeline flushes every time the GC state changes.

---

3. The Compaction Invariant: The Moving Target

V8 doesn't just delete objects; it moves them. This is called Compaction or Evacuation. Moving objects eliminates memory fragmentation, but it introduces a nightmare: if Object A points to Object B, and the GC moves Object B to a new address, Object A’s pointer is now dangling into the abyss.

The Invariant

*During an evacuation phase, every pointer to a moved object must be updated before any code can access it.*

The Hardware Reality: Store Buffers and Barriers

When V8 moves an object, it leaves a "forwarding pointer" at the old location. But updating every single reference in the heap is slow. V8 uses a Store Buffer to batch these updates.

The write barrier ensures that if you are writing to an object that is part of an "evacuation candidate" page, that write is logged with extreme precision.

// Low-level V8 snippet (conceptual)
void RecordWrite(Address host, Address slot, Address value) {
  if (Page::FromAddress(value)->InEvacuationCandidate()) {
    // This is the "heavy" write barrier
    StoreBuffer::Insert(slot); 
  }
}

Why this steals your cycles:
This specific invariant is why Array performance in JavaScript can be so inconsistent. Consider this code:

const largeArray = new Array(1000000).fill(null);

// Scenario A: Updating with a primitive
for (let i = 0; i < largeArray.length; i++) {
  largeArray[i] = i; // Fast: No write barrier needed for Smis
}

// Scenario B: Updating with objects
for (let i = 0; i < largeArray.length; i++) {
  largeArray[i] = { data: i }; // Slow: Full write barrier + potential store buffer insertion
}

In Scenario B, every single iteration of the loop triggers the write barrier logic. If largeArray has lived long enough to be in the Old Generation, and your new { data: i } objects are in the Young Generation, you are hitting the Generational Invariant tax every single time.

---

The "Write Barrier" Penalty in Real Numbers

To see the impact, I ran a small benchmark in Node.js. I compared updating an array of integers (no write barrier) vs. an array of objects (full write barrier).

const Benchmark = require('benchmark');
const suite = new Benchmark.Suite;

const size = 10000;
const oldGenArray = new Array(size).fill({}); // Prematurely old
const youngGenObj = { a: 1 };

suite.add('Smi Store', function() {
  const arr = new Array(size);
  for (let i = 0; i < size; i++) {
    arr[i] = i; // No write barrier
  }
})
.add('Object Store (Cross-Gen)', function() {
  for (let i = 0; i < size; i++) {
    oldGenArray[i] = { val: i }; // Massive write barrier pressure
  }
})
.on('cycle', (event) => console.log(String(event.target)))
.run();

On my machine (M2 Pro), the Smi Store is nearly 5x faster than the Object Store.

While some of that is allocation overhead, a significant chunk is the CPU constantly branching and checking the card table. If you look at the perf output on Linux, you’ll see the RecordWrite built-in function taking up 3-5% of total CPU time in object-heavy applications.

How to Stop the Theft (Or at Least Lower the Tax)

We can't disable write barriers—without them, the GC would crash. But we can write code that respects the hardware.

1. Prefer Primitive Arrays for Hot Loops

If you are doing heavy data processing, use TypedArrays (like Float64Array or Uint32Array). Since these only hold primitives, V8 completely elides the write barrier. There are no pointers to track, so there's no tax to pay.

2. Initialize Objects Completely

Try to avoid adding properties to objects after they’ve "aged." If an object is created and then immediately populated, it all happens in the Young Generation. The write barrier is still checked, but it fails early (young-to-young pointers are often ignored or handled cheaply).

If you take a long-lived object and suddenly start stuffing it with new, short-lived objects, you’re hitting the most expensive path of the generational barrier.

3. Watch out for "Hidden Classes" and Re-allocation

In V8, when an object changes its "shape" (Hidden Class), it might be re-allocated. If this happens to a large object in the Old Generation, you might be triggering an avalanche of write barriers as the runtime tries to move all the existing properties to the new structure.

// Bad: Adding properties late
function process(heavyObject) {
  // heavyObject is already in Old Generation
  heavyObject.tempData = { foo: 'bar' }; // Write Barrier Tax!
}

The Philosophical Trade-off

The write barrier is a fascinating piece of engineering. It represents a shift in complexity: we moved the burden of memory management from the deallocation phase (manual free()) to the mutation phase (assignment).

Every time you write a pointer, you are doing a tiny bit of the GC's work. You are "paying it forward" so that when the GC finally does run, it can finish in 1ms instead of 100ms.

It's a secret theft, yes. But in the world of modern web and server-side applications, it's the only reason we can have 60fps animations and low-latency APIs while still enjoying the luxury of not caring about pointers. Just remember: obj.x = y isn't free. Your CPU is checking the cards.