Why Does Your 'Graceful' Node.js Shutdown Still Result in Dropped Connections?

You’ve spent hours perfecting your Kubernetes probes and wrapping your server in a SIGTERM handler, yet every time you push a new image, your monitoring alerts scream about 502 Bad Gateways. It’s infuriating when "graceful" feels like a lie told by the Node.js documentation.

Most of us start with the standard "textbook" shutdown pattern. It looks something like this:

process.on('SIGTERM', () => {
  console.log('Received SIGTERM, shutting down gracefully');
  server.close(() => {
    console.log('Closed out remaining connections');
    process.exit(0);
  });
});

On paper, this is great. server.close() stops the server from accepting *new* connections and waits for existing ones to finish. But in the real world—especially in high-traffic environments or inside a container orchestrator—this code is basically a "Close Door" button on an elevator that isn't actually wired to anything.

The Keep-Alive Trap

The biggest culprit is the HTTP keep-alive header. To save the overhead of doing a TCP handshake for every single request, modern browsers and load balancers keep sockets open.

When you call server.close(), Node.js stops accepting *new* sockets, but it won't force-close existing ones that are currently idle. If a client has a socket open and just finished a request, that socket stays open. Node is essentially waiting for a guest who has already finished their meal but refuses to leave the table.

If you’re on Node.js v18.2.0 or later, you finally have a native way to kick them out:

process.on('SIGTERM', () => {
  // 1. Stop accepting new connections
  server.close(() => {
    console.log('All connections closed.');
    process.exit(0);
  });

  // 2. Force close IDLE connections immediately
  // This is the secret sauce. It doesn't kill active requests,
  // but it closes the keep-alive sockets that are just sitting there.
  if (typeof server.closeIdleConnections === 'function') {
    server.closeIdleConnections();
  }

  // 3. Set a hard timeout so we don't hang forever
  setTimeout(() => {
    console.error('Could not close connections in time, forceful shutdown');
    process.exit(1);
  }, 30000);
});

The Kubernetes "Propagation Delay"

If you're running in Kubernetes, the problem might not even be your code—it’s physics (or at least network latency).

When a Pod is marked for termination, two things happen simultaneously:
1. The Pod is sent a SIGTERM.
2. The Endpoint controller removes the Pod’s IP from the Service's list.

There is a window of a few seconds where your App has received the signal to stop, but the Load Balancer/Ingress still thinks it can send traffic your way. If your code immediately calls server.close(), you’ll reject those incoming requests with a ECONNREFUSED.

The fix is counter-intuitive: Wait before you shut down.

process.on('SIGTERM', async () => {
  console.log('SIGTERM received. Waiting for K8s propagation...');
  
  // Wait 5-10 seconds before actually closing the server.
  // This gives the Load Balancer time to realize we're leaving.
  await new Promise(resolve => setTimeout(resolve, 10000));

  server.close(() => {
    // ... cleanup logic
  });
});

Database Pools: The Silent Saboteurs

I've seen so many "perfect" shutdown scripts that close the HTTP server and then immediately call process.exit(0), leaving database connections hanging in the breeze. Or worse, closing the database pool while a final, slow HTTP request is still trying to query it.

You need to orchestrate the exit. It’s like a choreographed dance:
1. Stop taking new requests.
2. Wait for the Load Balancer to look away.
3. Close the server.
4. Then drain your database pools.

Here is a more robust pattern I like to use:

const shutdown = async (signal) => {
  console.log(`Received ${signal}. Starting graceful shutdown...`);

  // 1. Give the infrastructure time to update routes
  await new Promise(r => setTimeout(r, 5000));

  // 2. Stop accepting new connections
  server.close(async (err) => {
    if (err) {
      console.error('Error during server close:', err);
      process.exit(1);
    }
    
    try {
      // 3. Close DB connections AFTER the server stops processing
      console.log('Closing database pool...');
      await db.pool.end(); 
      
      console.log('Shutdown complete.');
      process.exit(0);
    } catch (dbErr) {
      console.error('Error closing database:', dbErr);
      process.exit(1);
    }
  });

  // Optional: kick the idle keep-alives
  server.closeIdleConnections();
};

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));

Don't forget the Health Checks

If your application has a /healthz or /ready endpoint, make sure it starts returning a 500 or 503 as soon as the SIGTERM is received. While Kubernetes relies on the Endpoint controller, many other load balancers rely purely on polling your health check.

If you keep returning 200 OK while you're trying to shut down, you're basically shouting "I'm open for business!" while you're locking the front door.

Summary

A truly graceful shutdown in Node.js requires more than just listening for a signal. You have to actively manage idle sockets, account for network propagation delays, and ensure your resource pools (DB, Redis, etc.) are the last things to turn off the lights.

If you aren't handling closeIdleConnections() and the K8s propagation delay, your "graceful" shutdown is mostly just wishful thinking. Try adding a small delay and an idle-flush to your next deployment—your logs (and your SREs) will thank you.