What Nobody Tells You About Nagle’s Algorithm: Why Your Small JSON Payloads Are Secretly Waiting 40ms to Leave the Server

Imagine you are looking at a Prometheus dashboard, and you see a bizarre, flat line in your p99 latency. Your microservice is doing almost nothing—just returning a small JSON object { "status": "ok" }—yet, consistently, some requests take exactly 40 milliseconds. Not 38ms, not 42ms, but a crisp, suspicious 40ms. You’ve checked the garbage collector, the database query is sub-millisecond, and the CPU is idling. You are likely a victim of Nagle’s Algorithm, a 40-year-old networking optimization designed for a world of 300-baud modems that is now quietly strangling modern, low-latency APIs.

The Congestion of 1984

In the early 1980s, the internet was a fragile web of low-bandwidth links. John Nagle, an engineer at Ford Aerospace, noticed a problem: the "Small Packet Syndrome."

Every TCP packet has a fixed overhead. You have a 20-byte TCP header and a 20-byte IP header. If you send a single character of data—say, a keystroke in a Telnet session—you are sending 41 bytes of data to deliver 1 byte of information. That is a 4000% overhead. On a congested network, thousands of these tiny packets would clog the pipes, leading to what Nagle called "congestion collapse."

His solution, codified in RFC 896, was elegant and simple:

If there is unacknowledged data already in flight, the sender should buffer any new small outgoing packets until it receives an acknowledgment (ACK) for the previous data, or until it has enough data to fill a full-sized packet (the Maximum Segment Size, or MSS).

For decades, this was the hero of the internet. It turned a stream of tiny, inefficient packets into a few large, efficient ones. But in 2024, when we are sending JSON payloads over gigabit fiber, Nagle’s Algorithm has become the "hidden boss" of tail latency.

The Collision: Nagle meets Delayed ACK

If Nagle’s Algorithm was the only player, it wouldn't be so bad. It would just wait for an ACK and move on. The real disaster happens when Nagle’s Algorithm interacts with another optimization: TCP Delayed ACK.

Delayed ACK is an optimization on the *receiver's* side. To save bandwidth, the receiver doesn't send an ACK immediately after getting a packet. Instead, it waits to see if it has any data to send back to the sender (so it can "piggyback" the ACK on a data packet) or if another packet arrives (so it can ACK two packets at once).

The timeout for this delay? In many Linux kernels, it’s 40ms. In Windows, it can be up to 200ms.

Here is the "Deadly Deadlock" scenario:
1. Client sends a small request to the server.
2. Server processes it and generates a small JSON response.
3. Server's TCP stack sees Nagle is enabled. It thinks: *"I have a small packet here, but I haven't received an ACK for the last thing I sent yet. I'll buffer this small JSON."*
4. Client's TCP stack received the request earlier but is using Delayed ACK. It thinks: *"I got the data, but I'll wait 40ms to see if I have something to send back before I ACK it."*

Both sides sit there, staring at each other. The server won't send the data until it gets an ACK. The client won't send an ACK until its timer expires or more data arrives. After 40ms, the client finally gives up, sends the ACK, and the server finally releases the JSON.

Watching the Delay in Real-Time

To see this in action, you don't need a complex observability suite. You can replicate it with a basic TCP socket script. Below is a Python example where we simulate a client-server interaction that triggers this exact latency spike.

The Victim Server (Python)

import socket
import time

# Create a standard TCP/IP socket
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(('localhost', 8080))
server.listen(1)

print("Server listening on port 8080...")

while True:
    conn, addr = server.accept()
    # Receive a small request
    data = conn.recv(1024)
    if data:
        # Simulate some logic
        # We send two small writes to trigger Nagle's buffering
        conn.sendall(b'{ "status":')
        # This second write will likely be buffered if Nagle is ON
        conn.sendall(b' "ok" }')
    conn.close()

The Client (Python)

import socket
import time

def make_request():
    start = time.perf_counter()
    client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    client.connect(('localhost', 8080))
    
    client.sendall(b'GET / HTTP/1.1\r\n\r\n')
    
    response = b""
    while True:
        chunk = client.recv(4096)
        if not chunk:
            break
        response += chunk
    
    end = time.perf_counter()
    print(f"Request took: {(end - start) * 1000:.2f}ms")
    client.close()

for _ in range(5):
    make_request()

If you run this on a system where Nagle's is the default (which is almost everywhere), and the kernel's Delayed ACK kicks in, you'll see those 40ms timestamps appear.

The One-Line Fix: TCP_NODELAY

The fix for this is a socket option called TCP_NODELAY. When you enable this, you are effectively telling the kernel: "Disable Nagle’s Algorithm. I don't care about packet overhead; I want this data on the wire *now*."

In modern high-performance environments—Redis, Memcached, Nginx, Node.js—this is almost always enabled by default. However, if you are writing custom socket code, or using an older library, you might be accidentally running with Nagle active.

Disabling Nagle in Node.js

Node.js disables Nagle by default for its net.Server and http modules, but if you're working with raw sockets:

const net = require('net');

const client = net.createConnection({ port: 8080 }, () => {
  // Disable Nagle's Algorithm
  client.setNoDelay(true);
  
  client.write('Small JSON payload');
});

Disabling Nagle in Go

Go's net package also enables TCP_NODELAY by default in most cases, but if you're building a custom dialer or using a specific transport, you might need to be explicit.

conn, err := net.Dial("tcp", "localhost:8080")
if err != nil {
    log.Fatal(err)
}

tcpConn := conn.(*net.TCPConn)
// SetNoDelay controls whether the operating system should delay packet 
// transmission in hopes of sending fewer packets (Nagle's algorithm).
// Default is true (no delay).
tcpConn.SetNoDelay(true)

Disabling Nagle in Rust

In the Rust ecosystem, especially with tokio or std::net, you control this on the TcpStream.

use std::net::TcpStream;

fn main() -> std::io::Result<()> {
    let stream = TcpStream::connect("127.0.0.1:8080")?;
    
    // Disable Nagle
    stream.set_nodelay(true)?;
    
    Ok(())
}

Why isn't this fixed at the OS level?

You might wonder why, in 2024, the Linux kernel still has Nagle’s Algorithm enabled by default. The reason is that the kernel doesn't know what kind of traffic you are sending.

If you are a background backup service (like rsync) or an SSH session where you are typing slowly, Nagle is actually your friend. It reduces the number of packets the CPU has to process and lowers the load on the network. The kernel prioritizes throughput and efficiency unless the application explicitly asks for low latency via TCP_NODELAY.

The "Write-Write-Read" Pattern

One of the most common ways developers accidentally trigger the Nagle/Delayed-ACK trap is the Write-Write-Read pattern.

I’ve seen this happen frequently in custom database drivers. The code looks like this:

1. write(header)
2. write(body)
3. read(response)

The first write (the header) goes out immediately because there is no unacknowledged data. The second write (the body) is small, and there is now unacknowledged data (the header) in flight. Nagle’s Algorithm kicks in and buffers the body. The server won't see the body, so it won't process the request, so it won't send a response. The client waits... the server waits... 40ms passes... the Delayed ACK timer expires... the header is ACKeyed... the body is finally sent.

How to avoid this without `TCP_NODELAY`?
Gather your data into a single buffer and make one write call. If you send the header and body together, Nagle doesn't have a chance to split them and buffer the second part.

# Bad: Triggers Nagle
socket.send(header)
socket.send(payload)

# Better: Avoids Nagle buffering
socket.send(header + payload)

However, even with single writes, TCP_NODELAY is still usually the correct choice for modern web services.

Is there a downside?

Disabling Nagle's Algorithm isn't a "free" performance boost without trade-offs.

1. Increased Packet Count: You will send more packets. If you are sending thousands of tiny updates per second (like a mouse-tracking coordinate stream), disabling Nagle could significantly increase the number of packets your network interface and your router have to handle.
2. Increased CPU Usage: Every packet requires an interrupt and processing through the networking stack. More packets = more CPU cycles spent on networking.
3. Congestion Risk: On extremely crowded or poor-quality networks (like satellite links or heavily congested public Wi-Fi), disabling Nagle can contribute to the "congestion collapse" it was designed to prevent.

For the vast majority of backend-to-backend communication (microservices in a data center), these trade-offs are negligible compared to the benefit of consistent, low-latency responses.

The Edge Case: TCP_CORK

There is another, even more aggressive cousin to Nagle called TCP_CORK. While Nagle says "wait if there's an ACK pending," TCP_CORK says "wait until the buffer is full or I manually uncork it, no matter what."

This is used in web servers like Apache or Nginx when serving files. They "cork" the socket, write the HTTP headers, use sendfile() to pour the file content into the kernel's buffer, and then "uncork" it. This ensures the headers and the start of the file are sent in the same packet, maximizing MTU (Maximum Transmission Unit) usage.

If you are seeing delays and you’ve already checked TCP_NODELAY, check if some middle-layer or proxy is using TCP_CORK and failing to "uncork" the socket in a timely manner.

Real-World Impact: A Case Study

I once worked on a real-time bidding engine where the p99 latency was around 45ms. In that industry, 45ms is an eternity; you've already lost the auction.

After digging through the code, we found a legacy logging client that was sending a "heartbeat" to a centralized collector over a standard TCP socket. It was doing a small write for the timestamp and a second write for the status code. Nagle was enabled. This tiny, background logging task was causing the entire request-response cycle of the main engine to stall while the kernel waited for the heartbeat's ACK.

By adding socket.setNoDelay(true), the p99 dropped from 45ms to 3ms. One line of code.

How to check your own systems

If you suspect you have a Nagle problem, there are three ways to confirm it:

1. The Histogram Check: Look at your latency distribution. Is there a suspicious hump at 40ms? Does it disappear when you increase the size of your payloads?
2. Wireshark/Tcpdump: This is the "gold standard." Capture a trace and look for the time gap between a PSH (Push) flag and the subsequent ACK. If the ACK arrives exactly 40ms after the data, and no more data was sent in between, you’ve found it.
3. Strace: On Linux, you can trace the system calls of your process. Look for the setsockopt calls.
`bash
strace -e setsockopt -p <pid>
`
If you don't see TCP_NODELAY being set, the default is "off" (meaning Nagle is "on").

Summary

Nagle’s Algorithm is a relic of a different era. While it was a masterpiece of engineering in 1984, it is often a silent performance killer in 2024. For modern APIs, especially those using JSON, Protobuf, or small gRPC messages, the bandwidth savings of Nagle are irrelevant compared to the latency penalties.

The checklist for developers:
- If you are building a real-time API, disable Nagle (TCP_NODELAY = 1).
- If you are writing a database driver, disable Nagle.
- If you must keep Nagle on, avoid the "Write-Write-Read" pattern; buffer your writes in the application layer and send them as one chunk.
- Always check your p99 for that tell-tale 40ms signature.

Don't let a 40-year-old algorithm decide your API's performance. Take control of your socket options and keep your packets moving.