Stop Allocating Inside the AudioWorkletProcessor: How to Build a Lock-Free Ring Buffer for Zero-Jitter Web Audio

The moment you call new Float32Array(128) inside your AudioWorkletProcessor.process method, you’ve already lost. It might not click or pop today, and it might run fine on your high-end development machine, but you have effectively handed a loaded gun to the JavaScript Garbage Collector (GC). In the high-priority world of real-time audio, the GC is a silent killer.

At a standard sample rate of 44.1kHz, the process() function is called every 128 samples. That gives you exactly 2.9 milliseconds to complete all your computations and return true. If the browser decides that *this* specific millisecond is the right time to scavenge memory and pause execution, you miss your deadline. The buffer goes empty, the hardware coughs, and your user hears a glitch.

To build pro-grade web audio—synths, DAWs, or complex effects—you must treat the AudioWorkletProcessor as a "no-allocation zone." This means no new objects, no array resizing, and definitely no postMessage for high-frequency data. Instead, we use a lock-free ring buffer built on SharedArrayBuffer and Atomics.

The Anatomy of the Memory Problem

In a standard AudioWorklet, developers often try to pass audio data from the main thread like this:

// DON'T DO THIS
this.port.onmessage = (event) => {
    this.latestBuffer = event.data; // Allocation and pressure on the heap
};

This is problematic for two reasons. First, postMessage involves data cloning or ownership transfer, which incurs overhead. Second, it encourages a reactive programming style where the Worklet is at the mercy of the main thread's event loop.

To achieve zero-jitter, the AudioWorklet needs to be a passive consumer. It shouldn't wait for data; the data should already be waiting in a shared memory space.

SharedArrayBuffer: The Common Ground

SharedArrayBuffer (SAB) allows you to map the same chunk of raw binary memory to both the main thread (or a Worker) and the AudioWorklet. Changes made on one side are visible on the other almost instantly, without any cloning.

However, shared memory introduces a race condition. If the main thread is writing a sine wave while the AudioWorklet is reading it, the Worklet might read a "torn" frame—half of the new data and half of the old. We solve this with a Ring Buffer (or Circular Buffer) and Atomics.

The State Management

We need three pieces of data in our shared memory:
1. A Write Index: Where the producer is currently writing.
2. A Read Index: Where the consumer is currently reading.
3. The Storage: The actual Float32Array holding the audio samples.

We store the indices in a Uint32Array and the samples in a Float32Array, both backed by the same SharedArrayBuffer or two separate ones. Using Atomics ensures that when we update the Write Index, the change is "published" across threads with proper memory barriers.

Building the Lock-Free Ring Buffer

Let's build a SharedRingBuffer class. We'll use a power-of-two size for the buffer because it allows us to use bitwise masks instead of the expensive modulo (%) operator for wrapping indices.

class SharedRingBuffer {
    constructor(sharedBuffer, type = Float32Array) {
        this.capacity = (sharedBuffer.byteLength - 8) / type.BYTES_PER_ELEMENT;
        // We reserve the first 8 bytes for two 32-bit indices (read/write)
        this.state = new Uint32Array(sharedBuffer, 0, 2);
        this.data = new type(sharedBuffer, 8, this.capacity);
        
        // Mask for power-of-two wrapping
        this.mask = this.capacity - 1;
    }

    static calculateByteLength(capacity, type = Float32Array) {
        // Ensure capacity is power of two for the mask trick
        if ((capacity & (capacity - 1)) !== 0) {
            throw new Error("Capacity must be a power of two.");
        }
        return 8 + (capacity * type.BYTES_PER_ELEMENT);
    }

    push(elements) {
        const writeIndex = Atomics.load(this.state, 0);
        const readIndex = Atomics.load(this.state, 1);
        
        const available = this.capacity - (writeIndex - readIndex);
        if (available < elements.length) return 0; // Buffer full

        for (let i = 0; i < elements.length; i++) {
            this.data[(writeIndex + i) & this.mask] = elements[i];
        }

        Atomics.store(this.state, 0, writeIndex + elements.length);
        return elements.length;
    }

    pull(outBuffer) {
        const writeIndex = Atomics.load(this.state, 0);
        const readIndex = Atomics.load(this.state, 1);
        
        const available = writeIndex - readIndex;
        if (available <= 0) return 0;

        const count = Math.min(available, outBuffer.length);
        for (let i = 0; i < count; i++) {
            outBuffer[i] = this.data[(readIndex + i) & this.mask];
        }

        Atomics.store(this.state, 1, readIndex + count);
        return count;
    }
}

Why Atomics.load and Atomics.store?

You might wonder why we don't just use this.state[0]. On modern multi-core CPUs, threads have local caches. A change made to a variable in the Main Thread might not be immediately visible to the AudioWorklet thread. Atomics operations bypass these optimizations, ensuring "atomic" access and memory visibility.

Crucially, we use Single-Producer, Single-Consumer (SPSC) logic here. As long as only the Main Thread calls push and only the Worklet calls pull, we don't need heavy locks or mutexes.

Implementing the AudioWorkletProcessor

Now, let's see how this looks inside the processor. Notice the complete absence of new or map or filter. We pre-allocate a local renderBuffer once in the constructor and reuse it forever.

// processor.js
class RingBufferProcessor extends AudioWorkletProcessor {
    constructor(options) {
        super();
        const sharedBuffer = options.processorOptions.sharedBuffer;
        this.ringBuffer = new SharedRingBuffer(sharedBuffer);
        
        // Pre-allocate a small array to extract data from the ring
        this.renderBuffer = new Float32Array(128);
    }

    process(inputs, outputs, parameters) {
        const output = outputs[0];
        const outputChannel = output[0]; // Simplified to mono

        // Pull exactly enough samples for one render quantum (128 samples)
        const readCount = this.ringBuffer.pull(this.renderBuffer);

        if (readCount < 128) {
            // Buffer underrun: Fill with silence if we don't have enough data
            outputChannel.fill(0);
        } else {
            // Copy the data to the output hardware buffer
            outputChannel.set(this.renderBuffer);
        }

        return true;
    }
}

registerProcessor('ring-buffer-processor', RingBufferProcessor);

The Main Thread Side

On the main thread, we initialize the SharedArrayBuffer. A common pitfall here is security: SharedArrayBuffer requires Cross-Origin Isolation. Your server must send these headers:
- Cross-Origin-Opener-Policy: same-origin
- Cross-Origin-Embedder-Policy: require-corp

Without these, window.SharedArrayBuffer will be undefined.

async function setupAudio() {
    const audioCtx = new AudioContext();
    
    // 8192 samples capacity (approx 185ms at 44.1kHz)
    const size = SharedRingBuffer.calculateByteLength(8192);
    const sab = new SharedArrayBuffer(size);
    const ringBuffer = new SharedRingBuffer(sab);

    await audioCtx.audioWorklet.addModule('processor.js');
    const node = new AudioWorkletNode(audioCtx, 'ring-buffer-processor', {
        processorOptions: { sharedBuffer: sab }
    });

    node.connect(audioCtx.destination);

    // Example: Pushing data from a worker or fetch stream
    function produceAudio(pcmData) {
        ringBuffer.push(pcmData);
    }
}

The "Modulo" Gotcha and Power-of-Two

In the SharedRingBuffer code, you’ll notice (index & this.mask). If your buffer size is 1024 (binary 10000000000), the mask is 1023 (binary 01111111111).

Performing a bitwise AND is significantly faster than the % operator. In a loop that runs thousands of times per second, these micro-optimizations prevent the CPU from spiking, which in turn prevents the OS from down-throttling your process—another hidden cause of audio glitches.

Handling Underruns Gracefully

What happens when pull returns 0? In the example above, we zero-fill the buffer. In a real application, you might want to perform a short cross-fade to silence to avoid a DC offset "click."

However, the real goal of a Ring Buffer is to provide a "jitter buffer." By keeping the Ring Buffer roughly half-full, you create a safety margin. If the main thread's event loop hangs for 50ms because of a heavy DOM reflow, the AudioWorklet continues to pull from the remaining 135ms of data in the buffer. The user hears nothing wrong.

When Should You Still Use postMessage?

Is postMessage always evil? No. Use it for:
1. UI Updates: Sending visualization data (FFT results) back to the main thread at 60fps.
2. Configuration: Changing a filter type or loading a new impulse response.
3. One-off Events: Triggering a "playback finished" callback.

Basically, if the data rate is measured in "times per second" rather than "samples per second," postMessage is fine. If you are moving raw audio, stick to the SharedArrayBuffer.

Summary of the Zero-Jitter Rules

1. Pre-allocate everything. Create your Float32Arrays in the constructor of the AudioWorkletProcessor, never in process().
2. Use SharedArrayBuffer for the heavy lifting.
3. Wrap indices with Atomics. Don't trust the CPU cache when it comes to the Read and Write pointers.
4. Size to powers of two. It makes the math faster and the code cleaner.
5. Monitor the "Fill Level." If your Ring Buffer is consistently empty, your producer is too slow. If it's consistently full, your buffer is too small or your producer is too aggressive.

Building audio tools on the web requires a mindset shift from "flexible JavaScript" to "rigid systems programming." By removing the Garbage Collector from the equation, you transform the Web Audio API from a toy into a professional DSP platform.