Why Does Your Node.js 'Cluster' Still Distribute Traffic Unevenly?

Ever looked at htop during a stress test only to find one worker process screaming at 100% CPU while the other seven are practically vibing in idle mode? It’s a frustrating rite of passage for Node.js developers. You followed the documentation, you used the cluster module to spawn a worker for every CPU core, and yet, your traffic distribution looks less like a balanced scale and more like a high-school group project where one person does all the work.

The "Magic" of the Cluster Module

By default, Node.js is single-threaded. To utilize multi-core systems, we use the cluster module to fork multiple instances of our application. On paper, it looks like this:

const cluster = require('node:cluster');
const http = require('node:http');
const numCPUs = require('node:os').availableParallelism();

if (cluster.isPrimary) {
  console.log(`Primary ${process.pid} is running`);

  // Fork workers.
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker) => {
    console.log(`worker ${worker.process.pid} died`);
  });
} else {
  // Workers can share any TCP connection
  // In this case it is an HTTP server
  http.createServer((req, res) => {
    res.writeHead(200);
    res.end('hello world\n');
  }).listen(8080);

  console.log(`Worker ${process.pid} started`);
}

You run this, hit it with autocannon or ab, and... the load is lopsided. Why?

The Ghost in the Kernel

Historically, Node.js left the distribution of incoming connections entirely up to the operating system. The master process would create the listening socket and pass the file descriptor to the workers. The kernel would then decide which process got the next connection.

The problem? The Linux kernel isn't trying to be "fair" in terms of request counts; it’s trying to be efficient.

The kernel often uses a "last man standing" or "warm cache" approach. If Worker A just finished a task, its memory and cache are already "hot." The kernel might decide it’s cheaper to give the next connection to Worker A again rather than waking up Worker B from a sleep state. This leads to worker starvation, where one or two processes hog all the work while others sit on the bench.

Node's Round-Robin (and why it isn't always the default)

To fix this, Node.js introduced a Round-Robin (RR) scheduling policy in version 0.12. In this mode, the primary process actually listens on the port, accepts the new connection, and then hands it off to a worker in a circular fashion.

You can check or set this policy manually:

const cluster = require('node:cluster');

// This is usually the default on everything except Windows
cluster.schedulingPolicy = cluster.SCHED_RR; 

// If you want the OS to handle it (not recommended for most)
// cluster.schedulingPolicy = cluster.SCHED_NONE;

Even with SCHED_RR enabled, you might still see unevenness. Why? Because connections are not requests.

The "Keep-Alive" Trap

This is the big one. If you're testing your app using a tool that utilizes Keep-Alive (which most modern browsers and load testers do), a single TCP connection stays open for multiple HTTP requests.

Node's Round-Robin distributes new TCP connections. Once a connection is handed off to Worker 3, every subsequent request sent over that specific connection stays with Worker 3 until the socket closes.

If you have a client (like a load balancer or a heavy-duty API consumer) that opens 4 persistent connections and sends 10,000 requests over them, and you have 8 workers, 4 of those workers will be doing 100% of the work. The other 4 will be at 0%.

Long-Running Tasks and "Stickiness"

If your workers perform tasks of varying complexity, Round-Robin starts to fail. Imagine this scenario:
1. Worker 1 gets a request that takes 2 seconds to compute (heavy crypto or image processing).
2. Worker 2 gets a request that takes 10ms.
3. Round-robin continues to feed both workers equally.

Worker 1's queue starts backing up because it's still chewing on that first heavy request, while Worker 2 is breezing through. Node's built-in cluster module doesn't know how "busy" a worker is; it just knows whose turn it is. It's a blind hand-off.

How to actually fix it

If you're hitting these bottlenecks, you have a few paths forward:

1. The "Better Load Balancer" Approach

Stop relying on the Node.js cluster module for high-stakes balancing. Use a battle-tested reverse proxy like NGINX or HAProxy in front of your Node processes. These tools have much more sophisticated balancing algorithms (like least_conn which sends traffic to the process with the fewest active connections).

2. PM2 (The "Easy" Way)

If you aren't using PM2 yet, you probably should. It abstracts the cluster module and handles some of the wonkiness for you.

pm2 start app.js -i max

While it still uses the cluster module under the hood, it provides better visibility into which workers are flapping or struggling.

3. Move to a Shared Work Queue

If your "tasks" are heavy, don't pass the task via the HTTP request itself. Instead, use a worker pool pattern with something like BullMQ or RabbitMQ.

// Instead of doing the heavy lifting in the HTTP worker:
app.post('/process-image', async (req, res) => {
  await imageQueue.add({ data: req.body }); // Push to Redis
  res.status(202).send('Processing...');
});

This way, workers "pull" work when they are actually free, rather than having work "pushed" onto them when they might already be drowning.

The Bottom Line

Node's cluster module is a great tool for a quick performance boost, but it's a blunt instrument. It distributes connections, not workload. If your load is uneven, check your Keep-Alive headers, look at your task duration variance, and remember: the Linux kernel cares about efficiency, not fairness.

If you need real fairness, you need a load balancer that can see how much work each process is actually doing.