loke.dev
Header image for The End of Deserialization

The End of Deserialization

Transforming data from the network into your heap is a hidden bottleneck you can no longer afford in high-throughput systems.

· 7 min read

I remember sitting in a windowless server room three years ago, staring at a flame graph that made no sense. We were running a high-frequency telemetry service, and despite having 32-core machines and 100Gbps NICs, our throughput was embarrassing. The culprit wasn't the database or the business logic; it was a single function call, json.Unmarshal, consuming 70% of our CPU cycles just to turn bytes into something the language could understand.

That was the moment I realized we've been building systems on a lie. We treat deserialization as a "necessary cost" of networking, but in high-throughput environments, it’s a tax that’s bankrupting our performance.

The Tax We Forgot We Were Paying

When you send a piece of data—say, a JSON object—over the wire, you’re sending a stream of ASCII characters. When it arrives at your application, your CPU has to:

1. Scan the entire buffer to find keys and values.
2. Validate that the syntax is correct.
3. Allocate new memory on the heap for every string, map, and nested object.
4. Convert types (e.g., string "123.45" to a float64).
5. Copy the data from the network buffer into those new heap locations.

This is the "Deserialization Tax." In a world of 1Gbps connections, we could afford it. But as we move toward 100Gbps and 400Gbps networking, the CPU can no longer keep up with the sheer volume of "parsing." The network is now faster than your ability to call new Object().

The Lie of the "Object"

We love objects because they’re easy to work with. We like user.ID and order.Items[0]. But the CPU doesn't care about your object hierarchy. It cares about memory addresses and cache lines.

Traditional deserialization takes a nice, contiguous block of bytes from the network card and explodes it into a thousand tiny fragments scattered across your RAM. This "pointer chasing" is a performance killer. Every time you follow a pointer to a nested object, you risk a cache miss, which is orders of magnitude slower than reading sequential memory.

Zero-Copy: The Philosophy of Laziness

The "End of Deserialization" isn't about finding a faster parser; it's about not parsing at all. This is known as Zero-Copy Deserialization.

The idea is simple: the format of the data on the wire should be exactly the same as the format of the data in memory. If I want to read the 5th element of an array, I shouldn't have to parse the first four. I should just calculate an offset—base_address + (index * element_size)—and read it.

Practical Example: The JSON Way (The Slow Way)

In Go, a standard JSON approach looks like this:

type SensorData struct {
    ID    int      `json:"id"`
    Readings []float64 `json:"readings"`
}

// Every time this runs, memory is allocated and strings are parsed.
func handleRequest(payload []byte) {
    var data SensorData
    err := json.Unmarshal(payload, &data) // The Bottleneck
    if err != nil {
        return
    }
    fmt.Println(data.Readings[0])
}

Every call to json.Unmarshal triggers the garbage collector (GC). If you're doing this 100,000 times a second, your GC will be working harder than your application.

Practical Example: The FlatBuffers Way (The Zero-Copy Way)

Tools like FlatBuffers or Cap'n Proto change the game. Instead of parsing, you map the byte buffer directly.

Here is a simplified schema (sensor.fbs):

table SensorData {
  id:int;
  readings:[double];
}
root_type SensorData;

And the Go code to read it:

import (
    "github.com/google/flatbuffers/go"
    "mygeneratedcode/schema"
)

func handleRequest(payload []byte) {
    // No Unmarshal! We just wrap the existing byte slice.
    data := schema.GetRootAsSensorData(payload, 0)
    
    // This access is just a pointer offset calculation.
    // No new memory is allocated.
    val := data.Readings(0)
    fmt.Println(val)
}

In this example, GetRootAsSensorData doesn't copy data. It just says, "Okay, I know where the fields are located in this byte array." When you call Readings(0), it looks at the offset defined in the buffer and pulls the bits directly. The memory usage is effectively zero.

The Mechanical Sympathy of Memory Mapping

To truly eliminate the bottleneck, we have to talk about mmap.

In high-throughput systems, you often need to read massive datasets that don't fit in RAM, or you need to share data between processes without the overhead of IPC (Inter-Process Communication).

By using a zero-copy format like FlatBuffers or SBE (Simple Binary Encoding), you can mmap a file directly into your process's address space. The OS handles loading the pages of the file into memory as you access them. Because the data on disk is already in the "in-memory" format, you don't "load" the file. You just start reading it.

// C++ Example: Accessing a memory-mapped FlatBuffer
int fd = open("sensor_data.bin", O_RDONLY);
struct stat st;
fstat(fd, &st);

// Map the file into memory
char* addr = (char*)mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);

// Use the data directly from the disk cache
auto sensor_data = GetSensorData(addr);
float val = sensor_data->readings()->Get(0);

// No "parsing" happened. The CPU just read the bits.

Why isn't everyone doing this?

If zero-copy is so much faster, why are we still using JSON and Protobuf? (Note: Protobuf is a hybrid; it's binary, but it still requires a full "unpacking" step, though proto3 has made strides).

1. Ergonomics: Zero-copy libraries are harder to use. You often have to use a code generator, and you can't just console.log a buffer to see what's inside.
2. Schema Rigidity: JSON is "schemaless" (sort of). You can add a field without breaking old clients. Zero-copy formats require strict schemas and careful versioning to maintain offset compatibility.
3. The "Good Enough" Trap: For most web apps, JSON is fine. If you're only handling 100 requests per second, the 2ms spent parsing doesn't matter. But we aren't talking about "most web apps." We're talking about systems where microseconds are the difference between a profit and a loss, or a stable cluster and a cascading failure.

The Protobuf Problem

People often group Protobuf with zero-copy because it’s binary. This is a mistake. Standard Protobuf uses variable-length encoding (varints). To find the second field in a message, the parser has to read the first field to know how many bytes long it is. You must parse a Protobuf message to use it.

In contrast, zero-copy formats use fixed offsets or a "vtable" at the start of the message. This allows for random access.

| Feature | JSON | Protobuf | FlatBuffers / SBE |
| :--- | :--- | :--- | :--- |
| Human Readable | Yes | No | No |
| Allocation Free | No | No | Yes |
| Random Access | No | No | Yes |
| Parsing Required | High | Medium | None |

Where the Bottleneck Shifts

When you eliminate deserialization, you'll find the bottleneck moves to the next logical place: The Kernel.

If you're using standard read() or recv() calls, the data is copied from the NIC, into kernel space, and then into user space. To go "Full Zero-Copy," you'll eventually look at things like DPDK (Data Plane Development Kit) or io_uring in Linux, which allow the application to pull data directly from the network card's buffers.

Designing for the Future

If you are building a system today that expects to handle high-throughput telemetry, financial data, or real-time gaming, you should assume that parsing is a bug.

Here’s how to transition:

1. Identify the Hot Path: Use a profiler (like pprof in Go or perf in Linux). If your top functions are memcpy, malloc, or string parsing, you are a candidate for zero-copy.
2. Schema First: Stop treating your data as "blobs." Define it. Even if you stay with JSON, using a schema helps. If you move to FlatBuffers or Cap'n Proto, the schema is your source of truth.
3. Lazy Evaluation: Only parse what you need. If you have a large object but only need one field, don't deserialize the whole thing.

The Verdict

The era of "string-based" everything is ending for infrastructure. We've optimized our databases, our compilers, and our networks, but we’re still throwing away half our CPU power because we like the convenience of strings.

The "End of Deserialization" is really just the beginning of Mechanical Sympathy. It’s about building software that respects how the hardware actually works. It's not the easiest path—your code will be more complex, and your debugging sessions will involve more hex dumps—but the performance gains are no longer optional. They are the new baseline.