loke.dev
Header image for The Day I Drained My Connection Pool: Why Node.js fetch Doesn't Respect Your Global Timeout

The Day I Drained My Connection Pool: Why Node.js fetch Doesn't Respect Your Global Timeout

A deep dive into the internal mechanics of Node.js fetch and why your AbortSignal might be leaving 'zombie' sockets open in your connection pool.

· 4 min read

Have you ever watched a monitoring dashboard and seen your connection pool usage climb like a mountain climber with no intention of coming down, despite your code having "strict" timeouts?

Last Tuesday, I lived that nightmare. I thought I was being a responsible developer. I used the native Node.js fetch API, I implemented AbortSignal.timeout(), and I walked away to get a coffee, confident that my microservice was bulletproof. Ten minutes later, the service was gasping for air, throwing 504s, and refusing to open a single new TCP connection.

It turns out that in the world of Node.js, "aborting a request" and "cleaning up a connection" are two very different things.

The Lie We Tell Ourselves

We’ve been trained to think that AbortSignal is the ultimate "stop" button. It feels like magic. You pass it a millisecond value, and if the server doesn't respond in time, the promise rejects, and we move on with our lives.

Here is what most of us are writing:

async function fetchWithTimeout(url) {
  try {
    // 5 seconds. Surely this is enough protection?
    const response = await fetch(url, { signal: AbortSignal.timeout(5000) });
    return await response.json();
  } catch (err) {
    if (err.name === 'AbortError') {
      console.error("Request timed out! We're safe, right?");
    }
    throw err;
  }
}

On the surface, this works. The await throws, your catch block logs the error, and your user gets a "Something went wrong" message. But under the hood, a zombie is being born.

Why the Socket Doesn't Die

Node’s global fetch is powered by Undici, a high-performance HTTP/1.1 client. Undici manages a "Dispatcher"—basically a pool of connections. When you call fetch, it grabs a socket from the pool, uses it, and then puts it back.

When you trigger an AbortSignal, you are telling the Promise to stop waiting. You are *not* necessarily telling the underlying TCP socket to immediately close and vanish.

If the server has already started sending data when the timeout hits, the socket is still busy receiving that data. Because the fetch promise has already rejected, there’s no code left to "consume" the rest of that response body. The socket gets stuck in a state where it’s waiting to finish a conversation that your code has already walked away from.

If you do this 100 times, you have 100 "zombie" sockets. Your pool is now drained.

Watching the Pool Drain

Let’s look at how this happens in a real-ish scenario. Imagine you’re calling a slow API in a loop.

import { Agent, setGlobalDispatcher } from 'undici';

// Let's limit the pool to 10 connections to see the failure faster
const agent = new Agent({ connections: 10 });
setGlobalDispatcher(agent);

async function leakConnections() {
  for (let i = 0; i < 50; i++) {
    try {
      // We timeout after 100ms, but the server takes 2000ms
      await fetch('http://localhost:3001/super-slow-api', { 
        signal: AbortSignal.timeout(100) 
      });
    } catch (e) {
      console.log(`Request ${i} timed out.`);
    }
  }
}

In this case, after the 11th request, your app will simply hang. It’s not that the server is down; it’s that your Node process is waiting for a free socket that will never come back.

The Fix: Taking Control of the Dispatcher

To stop the bleeding, you need to be explicit about how connections are handled. You can't just rely on the global defaults if you're doing high-volume work.

One way to mitigate this is by configuring the Agent to be more aggressive about cleaning up. If you set a bodyTimeout and a headersTimeout on the dispatcher level, Undici will actually kill the underlying socket when those limits are hit, rather than just rejecting the high-level promise.

import { Agent, setGlobalDispatcher } from 'undici';

const dispatcher = new Agent({
  connections: 50,
  // This is the key: tell the socket to die if it's taking too long
  bodyTimeout: 5000, 
  headersTimeout: 5000,
});

setGlobalDispatcher(dispatcher);

But Wait, There's a "Proper" Way

If you want to be a true professional about it, you should ensure the response body is always handled, even in a timeout scenario—though this is tricky with the fetch API because once the signal is aborted, the response object is often unreachable.

The real "gotcha" is that AbortSignal.timeout() is a blunt instrument. A better pattern if you are using undici directly (via request) is to use the onInfo or onHeaders callbacks, but since we're talking about the global fetch, we have to play by its rules.

Here is my "Defense in Depth" strategy for Node fetch:

1. Don't use the default Agent for production-critical external calls. Create a named Agent with a fixed connection limit so you don't accidentally take down your whole process.
2. Set `keepAliveTimeout` low if you’re talking to unreliable peers.
3. Consume the body. If you somehow get a response object before a timeout hits, always use await response.text() or await response.body.cancel() in your finally block.

async function robustFetch(url) {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), 5000);
  
  let response;
  try {
    response = await fetch(url, { signal: controller.signal });
    return await response.json();
  } catch (err) {
    // If we have a response but failed during json() parsing or something else,
    // we MUST cancel the body to release the socket.
    if (response?.body) {
      await response.body.cancel();
    }
    throw err;
  } finally {
    clearTimeout(timeout);
  }
}

The Moral of the Story

Node.js fetch is a beautiful addition to the ecosystem, but it's wearing a mask. Underneath that browser-standard syntax is a complex TCP pooling engine.

The next time you're debugging a "hanging" Node.js service, don't just look at your CPU and RAM. Check your open file descriptors. Check your socket count. You might find that your "safe" timeouts are actually the things killing your app.

Write code that doesn't just stop waiting, but actually stops the work. Your connection pool will thank you.