loke.dev
Header image for Stop Trusting `dns.lookup()`: The Hidden Threadpool Tax That’s Killing Your Node.js Throughput

Stop Trusting `dns.lookup()`: The Hidden Threadpool Tax That’s Killing Your Node.js Throughput

Uncover the synchronous secret hidden inside Node’s default networking and how to switch to non-blocking resolution before your threadpool reaches its breaking point.

· 7 min read

Your Node.js application is lying to you about being non-blocking. We’ve all bought into the "single-threaded event loop" gospel, believing that as long as we avoid fs.readFileSync or heavy JSON parsing, our throughput will scale linearly. But there is a silent, synchronous tax hidden inside the very core of Node’s networking module. If you are making outgoing HTTP requests—to an API, a database, or a microservice—you are likely bottlenecking your entire server on a threadpool designed for legacy system calls.

The culprit is dns.lookup(). It is the default method Node.js uses to resolve hostnames, and it is a performance trap.

The Architect’s Oversight

When you call https.get('https://api.internal.service'), Node.js doesn't magically know where that server lives. It needs to resolve the hostname to an IP address. Under the hood, Node calls the dns.lookup() function.

On the surface, dns.lookup() looks like any other asynchronous Node function. It takes a callback or returns a promise. But internally, it’s a wrapper around a synchronous C function called getaddrinfo(3).

The problem? getaddrinfo is a blocking call. Because Node’s event loop cannot afford to stop and wait for a DNS response, it offloads this work to the libuv threadpool. This is the same threadpool used for file I/O (fs), certain cryptographic functions (crypto), and heavy compression (zlib).

By default, the size of this threadpool is four.

If you have a high-traffic application making dozens of concurrent outgoing requests, those requests aren't actually running in parallel. They are queuing up, waiting for one of those four threads to become available. If a DNS server is slow to respond, your entire application’s throughput tanks—not because the CPU is busy, but because the threadpool is exhausted.

Visualizing the Bottleneck

Let's look at how this manifests in a real script. I’ve seen developers scratch their heads over why their app slows down during peak hours despite low CPU usage. Often, it looks something like this:

const dns = require('dns');

const start = Date.now();

// We are going to simulate 10 concurrent lookups
for (let i = 0; i < 10; i++) {
  dns.lookup('google.com', (err, address) => {
    console.log(`Lookup ${i+1} finished in ${Date.now() - start}ms`);
  });
}

If you run this, you might notice a "stair-step" pattern in the timing. The first four finish relatively quickly. The next four wait. The final two wait even longer. You’ve just hit the threadpool ceiling.

Now, imagine your app is also doing disk I/O. Because fs shares this same pool, your file reads are now waiting behind DNS resolutions for a public API. This is "invisible" latency that won't show up in your event loop lag metrics, but it will absolutely crush your P99 response times.

The Alternative: dns.resolve()

Node.js actually provides two different ways to handle DNS. While dns.lookup() uses the OS-level getaddrinfo, the dns.resolve() family of functions uses a library called c-ares.

The difference is night and day. c-ares does not use the threadpool. It implements DNS resolution directly over the network using non-blocking sockets. It is truly asynchronous.

Here is how the code changes:

const dns = require('dns').promises;

async function checkPerformance() {
  const start = Date.now();
  
  // dns.resolve4 specifically looks for IPv4 addresses
  const resolutions = Array.from({ length: 10 }).map(() => 
    dns.resolve4('google.com')
  );

  await Promise.all(resolutions);
  console.log(`All resolutions finished in ${Date.now() - start}ms`);
}

checkPerformance();

When you use resolve4(), you can fire off 1,000 requests, and Node will handle them all on the event loop without ever touching the libuv threads. Your throughput is now limited by your network bandwidth and the DNS server, not an arbitrary internal thread count.

The Catch (There’s Always a Catch)

If dns.resolve() is so much better, why isn't it the default?

It comes down to how your operating system defines a "hostname." The dns.lookup() method behaves exactly like your web browser or ping. It checks your /etc/hosts file (or C:\Windows\System32\drivers\etc\hosts), it handles local mDNS (like my-laptop.local), and it follows the configuration in /etc/resolv.conf.

dns.resolve(), because it bypasses the OS system calls, completely ignores your local hosts file.

If your infrastructure relies on /etc/hosts for service discovery (which is common in some legacy or Docker setups), switching to dns.resolve() will break your application. It won't be able to find db.internal because that record only exists in your local config, not on the actual DNS server.

Fixing the Global Agent

Most of us aren't calling dns.lookup() directly. We’re using axios, got, node-fetch, or the native https module. These all use dns.lookup() by default because they want to respect your OS configuration.

To fix this at the source without losing the ability to scale, you should provide a custom lookup function to your HTTP Agent.

Here is a practical example of how to force a library like axios or the native https module to use non-blocking resolution:

const https = require('https');
const dns = require('dns');

// A custom lookup function that uses dns.resolve under the hood
const asyncLookup = (hostname, options, callback) => {
  // We use resolve4 for IPv4, but you could add logic for IPv6 (resolve6)
  dns.resolve4(hostname, (err, addresses) => {
    if (err) {
      // Fallback to standard lookup if resolve fails 
      // (useful for local hostnames)
      return dns.lookup(hostname, options, callback);
    }
    
    // dns.resolve returns an array; dns.lookup expects a single string 
    // and the family (4 or 6)
    callback(null, addresses[0], 4);
  });
};

const agent = new https.Agent({
  lookup: asyncLookup,
  keepAlive: true // Always use keep-alive for high throughput!
});

// Use this agent in your requests
https.get('https://example.com', { agent }, (res) => {
  // ... handle response
});

In this pattern, we try the non-blocking resolve4 first. If it fails (perhaps because the hostname is in /etc/hosts), we fall back to the standard dns.lookup. This gives you the best of both worlds: high performance for external APIs and compatibility for internal networking.

Increasing the Threadpool Size

If you’re uncomfortable rewriting your DNS logic, or if you have dozens of dependencies that you can't easily configure, there is a "brute force" fix. You can increase the size of the libuv threadpool.

By setting the UV_THREADPOOL_SIZE environment variable, you can expand the pool from 4 up to 128.

# Set this before starting your Node process
export UV_THREADPOOL_SIZE=64
node server.js

Warning: This is a band-aid. Each thread carries overhead. If you're running on a tiny container with 0.5 CPU cores, spinning up 64 threads will cause significant context-switching overhead. It fixes the DNS queuing problem but introduces a different kind of performance tax. I generally recommend increasing this to 8 or 16 as a baseline for production apps, but relying on it to solve DNS issues is like buying a bigger bucket to deal with a leaky pipe.

The DNS Caching Problem

Node.js does not cache DNS lookups. At all.

Every time you call https.get('https://api.stripe.com'), Node performs a fresh DNS resolution. If you are making 100 requests per second to the same domain, you are performing 100 DNS lookups per second. Even with dns.resolve(), this is a massive waste of resources and adds unnecessary latency (the "DNS RTT").

The OS usually handles caching, but since Node wraps getaddrinfo, it often bypasses the OS cache or doesn't leverage it efficiently.

To truly optimize throughput, you should implement DNS caching at the application level. Libraries like dnscache can wrap the dns module and prevent redundant lookups:

const dns = require('dns');
const dnscache = require('dnscache')({
  "enable": true,
  "ttl": 300,
  "cachesize": 1000
});

// Now, calls to dns.lookup (and by extension, http.get) 
// will be cached for 5 minutes.

When you combine dnscache with a custom http.Agent that uses keepAlive: true, your networking stack becomes significantly more resilient. keepAlive reuses existing TCP connections, bypassing DNS entirely for subsequent requests. dnscache ensures that when a new connection *is* needed, it doesn't hit the threadpool or the network.

Summary of Action Items

If you are running a Node.js service that makes frequent outbound requests, here is your checklist for escaping the threadpool tax:

1. Audit your outgoing calls: Are you using the default http.Agent? If so, you're using dns.lookup().
2. Monitor Threadpool Usage: Use a tool like gclib or simple logging to see if your fs operations slow down when outbound traffic spikes.
3. Use `dns.resolve()` where possible: If you don't rely on local hosts files for your API targets, switch to resolve4 or resolve6.
4. Implement an async lookup fallback: Use the custom agent pattern shown above to keep the flexibility of lookup with the speed of resolve.
5. Enable DNS Caching: Don't pay the resolution tax more than once every few minutes.
6. Increase `UV_THREADPOOL_SIZE`: At least to 8 or 16, to give your app some breathing room for other background tasks.

Stop letting a legacy C function from the 1980s dictate the throughput of your modern asynchronous application. Node.js is powerful, but it requires you to understand which parts of the "non-blocking" promise are actually true. DNS resolution is the crack in the foundation—patch it before your traffic grows.