4 Hard Truths About Scaling WebSockets (That I Learned the Expensive Way)

Everything looks like magic when it’s just you and a single Node process running on a MacBook Pro. You open three browser tabs, send a message, and see it pop up instantly across all of them. You feel like a god of concurrency. Then you deploy to a production cluster, hit 5,000 concurrent users, and the whole thing starts acting like a haunted house—messages disappear, connections flap every thirty seconds, and your CPU usage looks like a mountain range.

I spent six months fixing a real-time system that was "working fine" until it wasn't. Here is what they don't tell you in the "Hello World" Socket.io tutorials.

1. Your Load Balancer is probably killing your connections

Most developers treat their load balancer (LB) as a "set it and forget it" piece of infrastructure. With standard REST APIs, that works. With WebSockets, your LB is frequently your worst enemy.

The problem is the Idle Timeout. AWS ALBs, NGINX, and Cloudflare all have default timeouts. If no data passes through the socket for, say, 60 seconds, the LB assumes the connection is dead and silently snips it. The client thinks it's connected; the server thinks it's connected; but the wire is cut.

The Fix: You need aggressive heartbeats (pings/pongs). Don't rely on the TCP layer to do this for you.

// A simple server-side heartbeat in Node.js
const wss = new WebSocket.Server({ server })

wss.on('connection', (ws) => {
  ws.isAlive = true
  ws.on('pong', () => {
    ws.isAlive = true
  })
})

const interval = setInterval(() => {
  wss.clients.forEach((ws) => {
    if (ws.isAlive === false) return ws.terminate()

    ws.isAlive = false
    ws.ping()
  })
}, 30000) // 30 second pulse

2. "Sticky Sessions" aren't optional

If you’re running more than one server instance (and if you aren't, why are you reading a post about scaling?), you will run into the Handshake Trap.

The WebSocket protocol starts as an HTTP request that gets "Upgraded." If your Load Balancer sends the initial HTTP GET to Server A, but the subsequent Upgrade request to Server B, the connection fails immediately. I've seen teams waste weeks trying to debug "intermittent connection drops" that were actually just the LB doing its job (load balancing) too well.

You _must_ enable session affinity (sticky sessions) based on IP or a cookie. It feels "anti-cloud" to tie a client to a specific server, but for the duration of a WebSocket handshake, it’s non-negotiable.

3. Redis is your new Source of Truth

In a stateless REST world, servers don't need to know about each other. In a stateful WebSocket world, Server A has a socket for User 1, and Server B has a socket for User 2. If User 1 sends a message to User 2, Server A has no idea how to find them.

You need a Pub/Sub layer to bridge the gap. Redis is the industry standard here, but it introduces a new failure point. If your Redis instance spikes in latency, your entire real-time engine chokes.

If you're using Socket.io, the adapter makes this easy, but don't treat it as a black box:

const httpServer = createServer()
const io = new Server(httpServer)
const pubClient = createClient({ url: 'redis://localhost:6379' })
const subClient = pubClient.duplicate()

// This allows messages to hop between physical server nodes
io.adapter(createAdapter(pubClient, subClient))

The Gotcha: Watch your Redis memory. Every "room" or "channel" you create consumes RAM. If you create a unique room for every single user and never clean them up, you'll be paged at 3:00 AM because Redis ran out of heap space.

4. The "Thundering Herd" will crush your database

This is the one that actually cost us the most money. Imagine you have 10,000 people connected to a live sports app. A goal is scored. You push a WebSocket message to all 10,000 clients saying "Hey, something happened!"

What do those 10,000 clients do? They all immediately hit your /api/get-latest-scores endpoint at the exact same millisecond.

Your database, which usually handles 100 requests per second, is suddenly hit with 10,000 concurrent queries. It falls over. The connections drop. The clients auto-reconnect, which triggers _another_ load of auth queries. It’s a death spiral.

The Solution:

Push the data, don't notify. Instead of sending a "data changed" ping, send the actual updated JSON payload in the WebSocket message itself.
Jitter. If the clients _must_ fetch data, add a random delay (0-500ms) on the client side before they fire the request. It sounds stupid, but it flattens the spike enough to save your backend.

Summary

Scaling WebSockets isn't really about the code inside your on('message') handler. It's about infrastructure plumbing. You have to manage the "state" that you spent years trying to get rid of in your REST APIs.

Keep your heartbeats frequent, your sessions sticky, your Redis instances beefy, and for the love of everything holy, don't let your clients DDOS your own database.