loke.dev
Header image for Your Service Mesh is a Double-Hop Tax: The Engineering Case for Sidecarless Networking

Your Service Mesh is a Double-Hop Tax: The Engineering Case for Sidecarless Networking

Examine the architectural shift from per-pod sidecars to node-level eBPF proxies and how it solves the persistent memory and latency penalties of modern microservices.

· 9 min read

I remember staring at a Datadog dashboard at 3 AM, tracking a p99 latency spike that shouldn't have existed. The application logs showed sub-10ms response times, yet the end-user was seeing 400ms of lag. We eventually found the culprit: a misconfigured Envoy sidecar that was fighting with a legacy iptables rule, trapped in a loop of context switches.

For years, the sidecar pattern—deploying a proxy container like Envoy alongside every application container—has been the gold standard for service meshes like Istio and Linkerd. It gave us mTLS, observability, and fine-grained traffic control without changing a line of application code. But as our clusters scaled from dozens to thousands of pods, the "sidecar tax" became a line item we could no longer ignore.

The industry is now shifting toward "sidecarless" networking, leveraging eBPF (Extended Berkeley Packet Filter) to move mesh logic from the pod into the kernel. It’s not just a trend; it’s an architectural correction.

The Geometry of the Double-Hop

To understand why we're moving away from sidecars, you have to look at the path a single packet takes. In a traditional sidecar mesh, a request doesn't just go from Service A to Service B. It undergoes a grueling multi-stage transit.

1. App A sends a packet to the loopback interface.
2. The Kernel intercepts it via iptables or IPVS and redirects it to Sidecar A (User Space).
3. Sidecar A processes the packet (mTLS, retries, etc.) and sends it back to the Kernel.
4. The Kernel sends it across the wire to the destination node.
5. The Kernel on Node B intercepts the incoming packet and redirects it to Sidecar B (User Space).
6. Sidecar B terminates the TLS, checks permissions, and sends it back to the Kernel.
7. The Kernel finally delivers it to App B.

Each transition between "Kernel Space" and "User Space" is a context switch. Each hop adds microseconds. When you have a deep microservice call chain (Service A -> B -> C -> D), these "micro-delays" compound into a massive tax on your tail latency.

The Memory Leak in Your Architecture

Beyond latency, there’s the resource cost. If you allocate 50MB of RAM and 0.1 CPU cores to an Envoy sidecar, that seems negligible. But if you're running 500 microservices with 3 replicas each, you are suddenly spending 75GB of RAM and 150 cores just to run your network.

In a "bin-packing" scenario where you’re trying to maximize node density, these sidecars are dead weight. They compete with your actual business logic for L3 cache and memory bandwidth.

The eBPF Alternative: Networking at the Speed of the Kernel

eBPF allows us to run sandboxed programs inside the Linux kernel without changing kernel source code or loading modules. For networking, this means we can intercept packets at the lowest possible level—the socket layer—and route them directly to their destination.

Instead of a "Double-Hop," a sidecarless architecture using something like Cilium looks like this:

1. App A sends a packet.
2. An eBPF program at the socket layer recognizes the destination.
3. If it’s on the same node, it shuffles the data directly to App B’s socket.
4. If it’s off-node, it handles the encryption and routing directly.

No iptables redirects. No jumping in and out of user-space proxies for every single packet.

Code Example: Bypassing the Stack with Cilium

While you rarely write the eBPF yourself (tools like Cilium do it for you), it's helpful to see how we define policies that act at the kernel level rather than through a proxy.

Here is a CiliumNetworkPolicy that enforces Layer 7 visibility and security without a sidecar:

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "secure-api-access"
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: my-api-service
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "GET"
          path: "/v1/public/.*"

In an Istio sidecar world, the Envoy proxy would have to parse this HTTP request. In the eBPF world, the kernel-level helper parses the header, validates the rule, and either drops or forwards the packet before it even hits the application’s network buffer.

The "Ambient" Compromise: Istio's New Direction

It would be unfair to say sidecars are dead. They are excellent at complex Layer 7 logic—things like header manipulation, sophisticated retries, and WAF-like features. eBPF is great at Layer 3/4 (IPs and Ports) but gets computationally expensive for heavy Layer 7 processing.

This led to the creation of Istio Ambient Mesh. Instead of a sidecar in every pod, it splits the work into two layers:

1. ztunnel (Zero Trust Tunnel): A node-level component (running as a DaemonSet) that handles mTLS and L4 telemetry using eBPF or a lightweight proxy.
2. Waypoint Proxy: A dedicated, per-namespace Envoy instance that handles the heavy L7 lifting only when needed.

Comparing the Architectures

| Feature | Sidecar Mesh (Envoy) | Sidecarless (eBPF/Cilium) | Ambient Mesh |
| :--- | :--- | :--- | :--- |
| Complexity | High (Injectors, Init Containers) | Low (Kernel Level) | Moderate |
| Latency | Highest (Double-Hop) | Lowest | Medium |
| Resource Usage | High (n * pods) | Low (per Node) | Variable |
| L7 Capabilities | Excellent | Limited | Excellent |

Practical Implementation: From Sidecars to Sidecarless

If you are currently running Istio or Linkerd and want to move toward a sidecarless model, you don't necessarily have to rip everything out. The transition usually starts with the CNI (Container Network Interface).

1. Migrating the Data Plane

If you’re using Cilium, you can enable "kube-proxy replacement." This removes the massive iptables chains that slow down service discovery.

Check your current iptables bloat with:

# Warning: This might be a long output on a busy node
iptables -t nat -L -n | wc -l

In many production clusters, this number can be in the thousands. Replacing this with eBPF maps makes lookups $O(1)$ regardless of how many services you have.

2. Enabling mTLS without the Sidecar

One of the main reasons people use sidecars is for "free" mTLS. With Cilium and eBPF, you can implement transparent encryption using IPsec or WireGuard at the node level.

To enable WireGuard encryption in Cilium:

cilium helm upgrade \
  --namespace kube-system \
  --set encryption.enabled=true \
  --set encryption.type=wireguard

Once enabled, the kernel handles the encryption of all traffic between nodes. The pods don't even know it's happening, and you don't need an Envoy sidecar to "wrap" the traffic in TLS.

The Catch: Where eBPF Hits a Wall

I’m an advocate for sidecarless, but it isn’t a silver bullet. There are "gotchas" that can bite you if you move too fast.

1. The Kernel Dependency:
Sidecars are portable. They run on almost any Linux distro because they live in user space. eBPF is deeply tied to the kernel version. If you’re running an old version of RHEL or an outdated Amazon Linux AMI, you might not have the helper functions required for advanced sidecarless features.

2. Observability Gaps:
While eBPF gives you great metrics on packet flow, it’s harder to get "Distributed Tracing" headers (like B3 or W3C Trace Context) injected into your application traffic without a proxy. If your organization relies heavily on Jaeger or Honeycomb for deep spans, you might still need a Waypoint proxy or some application-level instrumentation.

3. The "Shared Responsibility" Problem:
In a sidecar model, if a proxy crashes, it only takes down that one pod. If a node-level eBPF agent or a shared ztunnel has a bug, it can impact every single pod on that node. The blast radius is inherently larger.

Example: Monitoring the "Tax" Yourself

If you want to see the difference in your own environment, you can use a tool like fortio to measure the latency overhead.

Run a test against a service without a sidecar:

kubectl run load-test --image=fortio/fortio -- load -qps 500 -t 60s -quiet http://my-service:8080

Then, inject the sidecar and run it again:

# If using Istio
kubectl label namespace default istio-injection=enabled
kubectl rollout restart deployment my-service
kubectl run load-test --image=fortio/fortio -- load -qps 500 -t 60s -quiet http://my-service:8080

Watch the p95 and p99 results. In high-concurrency environments, you’ll often see a 2ms to 10ms delta. That might not sound like much, but in a microservice graph with 10 serial calls, you've just added 100ms to your user's wait time.

Why the "Double-Hop" is Becoming Untenable

As we move toward "Platform Engineering," the goal is to make the infrastructure invisible. The sidecar is the opposite of invisible—it's an intrusive, resource-hungry guest that requires custom configuration, security patching, and lifecycle management.

We're seeing a bifurcation in the market:
1. The Core Mesh: Moving into the kernel via eBPF. This handles identity, encryption, and basic routing.
2. The App Mesh: Moving into shared, per-node or per-namespace proxies (like Waypoints) for complex business logic.

The "Double-Hop Tax" was a necessary evil during the early days of Kubernetes because the kernel wasn't ready to handle the complexity of service discovery and mTLS. That's no longer the case.

Final Thoughts: Should You Switch?

Don't migrate just because it's the new shiny thing. If your cluster is small and your latency requirements aren't strict, the sidecar model is well-understood and easy to debug with standard tools like tcpdump and curl.

However, if you are:
- Spending more than 10% of your cluster resources on sidecars.
- Struggling with p99 latencies that don't match your app logs.
- Fighting with iptables deadlocks or complex routing rules.

Then it's time to look at the kernel. The engineering case for sidecarless networking isn't just about saving RAM; it's about simplifying the network stack to be what it was always meant to be: a transparent utility, not a tax.

The most efficient way to process a packet is to process it once. Moving that logic into the kernel via eBPF finally makes that possible. The sidecar was a brilliant bridge to get us here, but the bridge is starting to feel like a bottleneck. It’s time to take the direct route.