
The Network RTT Is a Lie: Using the TCP_INFO Struct to Audit Your Kernel-Level Socket Health
High-level timers are blind to kernel-level congestion, but querying the TCP_INFO struct allows your application to audit its own network health in real-time.
Your application’s latency metrics are lying to you. Every time you wrap a curl call or a database query in a stopwatch timer, you aren't measuring network speed; you're measuring the sum of your own application's overhead, kernel scheduling, and the network’s actual performance. By the time your application code realizes a packet has been delayed, the kernel has already known about it for several milliseconds, tried to fix it, and likely failed.
If you are relying on application-layer Round Trip Time (RTT) to diagnose network health, you are looking at a filtered, distorted version of reality. To see what’s actually happening, you have to go deeper—down into the tcp_info struct living in the Linux kernel.
The Blind Spot of High-Level Timers
When we measure RTT at the application level—say, in Python or Go—we usually do something like this:
import time
import requests
start = time.perf_counter()
requests.get("https://api.example.com")
end = time.perf_counter()
print(f"Latency: {end - start}s")This is easy to write, but it’s fundamentally flawed for fine-grained debugging. This timer includes:
1. The time it takes for your language runtime to schedule the thread.
2. The time spent in the kernel's networking stack before the first bit hits the wire.
3. The actual transit time.
4. The remote server’s processing time (which is the biggest variable).
5. The time it takes for the response to climb back up the stack to your app.
If your "latency" spikes, you have no idea if the network is congested, if the remote CPU is pegged, or if your local machine is experiencing context-switch hell. The kernel, however, keeps a meticulous ledger for every single socket. It knows exactly how many packets were retransmitted, the smoothed RTT (SRTT), and the size of the congestion window.
Entering the Kernel: The tcp_info Struct
In Linux, the kernel maintains a struct tcp_info for every TCP connection. This struct is the "Source of Truth." It's defined in /usr/include/linux/tcp.h and it contains fields that most developers never see, but SREs at places like Cloudflare or Google live by.
Here is what the simplified version looks like:
struct tcp_info {
__u8 tcpi_state;
__u8 tcpi_ca_state;
__u32 tcpi_rto; /* Retransmission timeout */
__u32 tcpi_rtt; /* Smoothed RTT in microseconds */
__u32 tcpi_rttvar; /* RTT variance */
__u32 tcpi_snd_cwnd; /* Sending congestion window */
__u32 tcpi_total_retrans; /* Total retransmits for the lifetime of the socket */
// ... many more fields
};By querying this struct directly from your application, you can differentiate between "the network is slow" and "the remote server is taking a long time to think."
How to Extract the Truth (C Implementation)
To get this data, you use the getsockopt system call with the TCP_INFO flag. Here is a practical example in C that opens a connection and then audits its own health.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <netinet/tcp.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <unistd.h>
void print_socket_stats(int sockfd) {
struct tcp_info info;
socklen_t len = sizeof(info);
if (getsockopt(sockfd, IPPROTO_TCP, TCP_INFO, &info, &len) == -1) {
perror("getsockopt");
return;
}
// tcpi_rtt is in microseconds
printf("\n--- Kernel-Level Socket Health ---\n");
printf("Smoothed RTT: %u us\n", info.tcpi_rtt);
printf("RTT Variance: %u us\n", info.tcpi_rttvar);
printf("Total Retransmits: %u\n", info.tcpi_total_retrans);
printf("Congestion Window: %u\n", info.tcpi_snd_cwnd);
printf("Unacked Packets: %u\n", info.tcpi_unacked);
printf("Lost Packets: %u\n", info.tcpi_lost);
}
int main() {
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in serv_addr;
serv_addr.sin_family = AF_INET;
serv_addr.sin_port = htons(80);
inet_pton(AF_INET, "1.1.1.1", &serv_addr.sin_addr);
if (connect(sockfd, (struct sockaddr *)&serv_addr, sizeof(serv_addr)) < 0) {
perror("Connection Failed");
return -1;
}
// Send some data to populate the stats
char *msg = "GET / HTTP/1.1\r\nHost: 1.1.1.1\r\n\r\n";
send(sockfd, msg, strlen(msg), 0);
// Give the kernel a moment to receive an ACK and update RTT
usleep(100000);
print_socket_stats(sockfd);
close(sockfd);
return 0;
}Why this matters
In the code above, info.tcpi_rtt is the actual time it took for a packet to be acknowledged by the remote NIC. This value is stripped of application-layer scheduling noise. If your tcpi_total_retrans starts incrementing while your tcpi_rtt remains stable, you know you have packet loss (likely a bad switch or cable), but the path itself isn't necessarily congested.
Implementing in Go: Real-World Usage
Most of us aren't writing raw C for our web services. Fortunately, Go makes it relatively easy to drop down into the syscall layer to grab this info. If you're running a high-throughput microservice, you can use a "middleware" style approach to log socket health when a request takes longer than expected.
package main
import (
"fmt"
"net"
"os"
"syscall"
"unsafe"
)
// We need to mirror the kernel's struct for the fields we care about.
// Note: This is Linux specific!
type TCPInfo struct {
State uint8
CAState uint8
Retransmits uint8
Probes uint8
Backoff uint8
Options uint8
_ [2]byte // padding
Rto uint32
Ato uint32
SndMss uint32
RcvMss uint32
Unacked uint32
Sacked uint32
Lost uint32
Retrans uint32
Fackets uint32
/* Times are in msecs */
LastDataSent uint32
LastAckSent uint32
LastDataRecv uint32
LastAckRecv uint32
/* Metrics */
Pmtu uint32
RcvSsthresh uint32
Rtt uint32
Rttvar uint32
SndSsthresh uint32
SndCwnd uint32
Advmss uint32
Reordering uint32
}
func getTCPInfo(conn *net.TCPConn) (*TCPInfo, error) {
raw, err := conn.SyscallConn()
if err != nil {
return nil, err
}
var info TCPInfo
var innerErr error
err = raw.Control(func(fd uintptr) {
infoLen := uint32(unsafe.Sizeof(info))
_, _, errno := syscall.Syscall6(
syscall.SYS_GETSOCKOPT,
fd,
syscall.IPPROTO_TCP,
syscall.TCP_INFO,
uintptr(unsafe.Pointer(&info)),
uintptr(unsafe.Pointer(&infoLen)),
0,
)
if errno != 0 {
innerErr = error(errno)
}
})
if err != nil {
return nil, err
}
if innerErr != nil {
return nil, innerErr
}
return &info, nil
}
func main() {
addr, _ := net.ResolveTCPAddr("tcp", "google.com:80")
conn, _ := net.DialTCP("tcp", nil, addr)
conn.Write([]byte("GET / HTTP/1.0\r\n\r\n"))
info, err := getTCPInfo(conn)
if err != nil {
fmt.Fprintf(os.Stderr, "Error: %v\n", err)
return
}
fmt.Printf("Kernel RTT: %dms\n", info.Rtt/1000)
fmt.Printf("Congestion Window: %d\n", info.SndCwnd)
}The "Silent Killer": tcpi_retrans and tcpi_lost
In a standard application, you usually don't know a packet was lost until the entire request times out or takes significantly longer. But the kernel knows immediately when it has to retransmit.
If you are seeing 500ms response times from a database, check your tcpi_retrans. If it’s high, it doesn't matter how much you optimize your SQL queries; your network layer is flapping.
One of the most valuable use cases for TCP_INFO is Dynamic Load Balancing. Imagine a load balancer that doesn't just look at the number of active connections, but actually queries the kernel to see which upstream backend has the lowest tcpi_rtt and the fewest retransmissions. You can route around "gray failures"—nodes that are still alive but are experiencing NIC issues or are connected to a flaky Top-of-Rack switch.
Deciphering the Congestion Window (tcpi_snd_cwnd)
If you've ever wondered why a 10Gbps link is only giving you 100Mbps of throughput, the answer is usually found in tcpi_snd_cwnd.
TCP uses a "Congestion Window" to determine how many unacknowledged packets can be in flight. If the kernel detects loss, it slashes this window to prevent further congestion. By monitoring tcpi_snd_cwnd, your application can detect that the network path is "narrowing" before the actual throughput drops significantly.
For example, if you are building a video streaming service, you could use TCP_INFO to proactively downsample the bitstream if you see the tcpi_snd_cwnd shrinking, rather than waiting for the player's buffer to empty.
The Catch: It's Not Universal
Before you go and rewrite your entire monitoring stack, there are a few "gotchas" that I've learned the hard way:
1. Platform Locked: TCP_INFO is a Linux-ism. While BSD (and macOS) have similar concepts (like TCP_CONNECTION_INFO), the struct members and units of measurement are different. If you’re writing cross-platform code, you’ll need a lot of #ifdef or build tags.
2. Snapshot in Time: getsockopt provides a snapshot. If a burst of retransmissions happens and then stops, you might miss it if you don't poll frequently enough. However, fields like tcpi_total_retrans are cumulative, which helps mitigate this.
3. Kernel Versions: Over the years, the tcp_info struct has grown. If your code is compiled against a modern kernel header but runs on an ancient 3.x kernel, you might get truncated data or errors. Always check the size of the returned struct.
4. The "Smoothed" RTT: tcpi_rtt is an average. A single massive spike might be averaged out. If you need to see every single jitter peak, you need to look at eBPF (but that's a whole other blog post).
Moving Beyond Blind Monitoring
We spend so much time instrumenting our code with OpenTelemetry and Prometheus, yet we treat the network as a black box that just works until it doesn't.
Querying TCP_INFO isn't something you need for every simple CRUD app. But if you're building high-performance systems, real-time data pipelines, or distributed databases, stop guessing. Stop using Stopwatch.Start(). Ask the kernel; it's already done the math for you.
When the network RTT is a lie, the tcp_info struct is the only way to find the truth.


