loke.dev
Header image for 4 Memory-Layout Patterns That Will Finally Rescue Your Node.js FFI Performance

4 Memory-Layout Patterns That Will Finally Rescue Your Node.js FFI Performance

Moving data between JavaScript and Rust or C++ is faster than ever, provided you stop treating the FFI boundary like a standard function call and start thinking in shared memory.

· 7 min read

I used to think that moving a heavy computation from Node.js to Rust or C++ was a guaranteed win. I’d spent a weekend rewriting a physics engine in Rust, expecting a 10x speedup, only to find the "optimized" version was actually 20% slower than the original JavaScript. It was maddening. I had the fastest language in the world on one side and a highly optimized JIT on the other, but the bridge between them was a toll road that charged me every time a byte tried to cross.

The problem wasn't the code on either side. It was how I was moving the data. I was stringifying objects into JSON or copying massive TypedArrays every single frame. If you treat the Foreign Function Interface (FFI) like a standard function call, you’ve already lost.

To get real performance, you have to stop thinking about "passing variables" and start thinking about shared memory layouts. Here are four patterns that will help you stop the bleeding and actually get the performance you were promised.

---

1. The "Pre-allocated Slab" (Buffer Reuse)

The most common mistake is allocating memory inside your hot loop. If you're calling a native function 10,000 times a second and creating a new Buffer or Uint8Array for each call, you’re spending more time in the Garbage Collector (GC) than in your actual logic.

V8’s GC is smart, but it’s not a psychic. It doesn't know that the memory you just allocated is only needed for the duration of a single C++ call.

The Pattern

Instead of letting the native side return a new buffer, create one large "Slab" on the JavaScript side during initialization. Pass that buffer into your native function and let the native side write directly into it.

// DON'T DO THIS
for (let i = 0; i < 1000; i++) {
  const data = nativeAddon.getProcessedData(input); // Allocates a new Buffer every time
  handle(data);
}

// DO THIS
const slab = Buffer.allocUnsafe(1024 * 1024); // 1MB pre-allocated
for (let i = 0; i < 1000; i++) {
  const bytesWritten = nativeAddon.processIntoSlab(input, slab);
  // Read from the same memory space without new allocations
  const view = slab.subarray(0, bytesWritten); 
  handle(view);
}

On the Rust side (using napi-rs), you’re just getting a pointer to that existing memory:

#[napi]
fn process_into_slab(input: Vec<f64>, mut slab: Buffer) -> u32 {
    let slab_slice: &mut [u8] = slab.as_mut();
    // Do work directly on the slice
    let result = perform_calc(input);
    slab_slice[..result.len()].copy_from_slice(&result);
    result.len() as u32
}

By reusing the same memory address, you eliminate the allocation overhead and keep the CPU cache "warm."

---

2. The Packed C-Struct (Binary Layouts)

JavaScript objects are incredibly flexible, but they are memory-layout nightmares. An array of objects like [{x: 1, y: 2}] is stored as an array of pointers to objects, which are themselves dictionaries in memory. When you pass this to C++, the FFI layer has to "traverse" that entire graph, converting JS types to C types. This is the "Serialization Tax."

The Pattern

If you need to pass complex data, don't pass objects. Define a strict binary layout—a Struct-in-a-Buffer. Use a DataView or a TypedArray to pack your data into a flat memory space that matches a C struct.

Imagine we are passing 3D particle data:

// JS Side: Packing a "struct" into a Float32Array
const particleCount = 1000;
const stride = 3; // x, y, z
const buffer = new Float32Array(particleCount * stride);

function updateParticle(id, x, y, z) {
  const offset = id * stride;
  buffer[offset] = x;
  buffer[offset + 1] = y;
  buffer[offset + 2] = z;
}

// Pass the whole underlying buffer at once
nativeAddon.processParticles(buffer.buffer);

On the C++ side, you don't iterate through a JS Array. You cast that pointer to your struct:

struct Particle {
    float x, y, z;
};

napi_value ProcessParticles(napi_env env, napi_callback_info info) {
    // ... get the buffer pointer ...
    Particle* particles = reinterpret_cast<Particle*>(data);
    
    for (int i = 0; i < particleCount; i++) {
        particles[i].x += 0.1f; // High-speed access!
    }
}

This pattern is why tools like FlatBuffers or Protocol Buffers are so popular in high-performance systems. You’re essentially telling the computer: "Stop guessing what my data looks like. Here is exactly where the bits are."

---

3. The "Ring Buffer" for High-Frequency Events

Sometimes you have a stream of data—like logs, sensor readings, or audio samples—that needs to move between Node.js and Native code constantly. Creating a bridge for every single event is too expensive because of the "FFI Trampoline" effect. Every time you cross from JS to C++, the engine has to save the current state and set up a new execution context.

The Pattern

Instead of calling a function for every event, use a Shared Ring Buffer (or Circular Buffer). You create a single shared memory area and two pointers: a head and a tail.

1. The Native side writes data to the tail and increments it.
2. The JS side reads from the head and increments it.
3. If tail catches up to head, the buffer is full.

This allows you to "batch" your processing without the latency of an actual batching system.

// JS Side
const sharedBuffer = new SharedArrayBuffer(1024 * 64);
const state = new Int32Array(new SharedArrayBuffer(8)); // [head, tail]
const data = new Uint8Array(sharedBuffer);

function poll() {
  const [head, tail] = [Atomics.load(state, 0), Atomics.load(state, 1)];
  if (head !== tail) {
    // Process everything between head and tail
    process(data.subarray(head, tail));
    Atomics.store(state, 0, tail); // Update head
  }
  setImmediate(poll);
}

The magic here is Atomics. This ensures that even if Node.js and your C++ thread are accessing the same memory simultaneously, they won't corrupt the pointers. You’ve successfully moved from "Function Call" communication to "Memory State" communication.

---

4. The "Orphaned Buffer" (Memory Ownership Hand-off)

There are cases where your native code generates a massive amount of data—like an image buffer or a database result set—and you need it in Node.js. Usually, you’d copy this data into a Node.js Buffer.

But if the data is large (say, 50MB), that copy will take 10-20ms. In a high-speed application, 20ms is an eternity.

The Pattern

Use External Buffers. You can tell Node.js to create a Buffer that points to memory *already allocated* by your C++ or Rust code. You aren't copying the data; you're just handing Node.js the "address" of that data.

The catch? You have to tell Node.js how to free that memory when the JS object is garbage collected, otherwise, you have a massive memory leak.

In C++ (N-API):

void FinalizeCallback(napi_env env, void* finalize_data, void* finalize_hint) {
    // This runs when JS is done with the buffer
    free(finalize_data); 
}

napi_value GetLargeData(napi_env env, napi_callback_info info) {
    void* data = malloc(1024 * 1024 * 100); // 100MB
    // Fill 'data' with something...

    napi_value buffer;
    napi_create_external_arraybuffer(
        env, 
        data, 
        1024 * 1024 * 100, 
        FinalizeCallback, 
        NULL, 
        &buffer
    );
    return buffer;
}

This is the fastest way to move data into Node.js. It’s a Zero-Copy handoff. The JS side sees a standard ArrayBuffer, but it's actually looking at raw system memory managed by your native code.

---

The "Gotchas" of Shared Memory

Before you go off and refactor everything into shared buffers, there are three things that usually trip people up. I learned these the hard way.

1. Memory Alignment

Native CPUs (especially ARM) hate it when you try to read a 4-byte integer from an address that isn't a multiple of 4. If you’re packing your buffers (Pattern 2), always ensure your fields are aligned. If you have a uint8 followed by a uint32, add three "padding" bytes of zero between them. If you don't, your performance will tank—or your app will just crash with a Bus Error.

2. The V8 Memory Pressure

V8 tracks how much memory it’s using to decide when to run the Garbage Collector. When you use "External Buffers" (Pattern 4), V8 only sees the small JS object, not the 100MB of C++ memory it points to.

If you don't notify V8 of this "External Memory," the GC might never run, and your app will balloon until the OS kills it. Use napi_adjust_external_memory to tell Node: "Hey, I'm holding 100MB of invisible stuff, please take that into account."

3. Strings are the Enemy

If you take one thing away from this: FFI hates strings.
In V8, strings are usually UTF-16. In Rust/C++, they are usually UTF-8. Every time you pass a string, the engine has to iterate through every character to transcode it. If you can, use Uint8Array (Buffer) everywhere and only decode it to a string on the JS side when you absolutely have to (like for a final UI render).

Which pattern should you use?

I generally follow this hierarchy:
1. Small, infrequent data? Just pass it normally. Don't over-engineer.
2. Large data moving from JS to Native? Use Pattern 1 (Slabs).
3. Complex objects/arrays? Use Pattern 2 (Packed Structs).
4. Streaming data? Use Pattern 3 (Ring Buffers).
5. Huge data moving from Native to JS? Use Pattern 4 (External Buffers).

FFI doesn't have to be slow. The bottleneck isn't the language—it's the overhead of moving data across the boundary. If you change your mindset from "calling functions" to "aligning memory," you'll find that Node.js is capable of performance levels that most developers think are impossible.

Now go delete those JSON.stringify calls in your hot loops. Your CPU will thank you.