loke.dev
Header image for 3 Mathematical Guarantees You Need to Understand Before Building a CRDT

3 Mathematical Guarantees You Need to Understand Before Building a CRDT

A deep dive into how commutativity, associativity, and idempotency solve the hardest problems in distributed state without a central authority.

· 7 min read

I remember sitting in a windowless office in 2016, staring at a database that had somehow decided my user's name was both "Alice" and "Bob" at the same exact time. We were trying to build a collaborative editor, and our "sync logic" was essentially a giant pile of if statements and prayers, which—as it turns out—is not a valid strategy for distributed systems.

If you are moving away from a single-source-of-truth architecture toward something decentralized, offline-first, or globally distributed, you will eventually bump into CRDTs (Conflict-free Replicated Data Types). They feel like magic when you first see them: two users edit the same document offline, reconnect, and their changes merge perfectly without a central server deciding who "won."

But CRDTs aren't magic; they are just data structures that obey a very specific set of algebraic rules. If your data structure doesn't satisfy three specific mathematical properties—Commutativity, Associativity, and Idempotency—it isn't a CRDT. It’s just a bug waiting to happen.

Let's break down why these three properties are the only things standing between you and total data corruption.

The Foundation: The Join-Semilattice

Before we hit the "Big Three," we need to talk about the Lattice. In the world of CRDTs (specifically state-based ones, often called CvRDTs), we treat our data as a "join-semilattice."

Think of it as a directed graph where any two points have a "least upper bound"—a shared point they can both move toward that represents the union of their information. When we "merge" two states, we are effectively moving up the lattice to that shared point.

For this merge operation to work reliably across a messy network, the merge function (let's call it merge(a, b)) must behave predictably. That’s where the math comes in.

---

1. Idempotency: The "I Already Said That" Guarantee

In a distributed system, "exactly-once" delivery is a myth. Packets get dropped, retried, and duplicated. If your merge logic isn't idempotent, a flaky Wi-Fi connection will eventually corrupt your state.

The Math: $f(x, x) = x$

Idempotency means that applying the same operation multiple times has the same effect as applying it once. If I receive the same state update from a peer five times, the result of my local state should be identical to receiving it once.

Why this breaks naive counters

Imagine you’re building a simple "Like" counter. A naive implementation might look like this:

// WRONG: This is not idempotent
let likes = 0;

function handleUpdate(increment) {
    likes += increment;
}

// If the network retries the "increment by 1" packet:
handleUpdate(1);
handleUpdate(1); // Error! likes is now 2, but should be 1.

To make this idempotent, we can't just send increments. We have to send a state that represents the "maximum" information we have. A common way to do this in CRDTs is using a G-Counter (Grow-only Counter), where we track increments per node ID.

// BETTER: Idempotent state merge
class GCounter {
  constructor(id) {
    this.id = id;
    this.state = {}; // Mapping of nodeID -> count
  }

  increment() {
    this.state[this.id] = (this.state[this.id] || 0) + 1;
  }

  merge(other) {
    const allKeys = new Set([...Object.keys(this.state), ...Object.keys(other.state)]);
    for (const key of allKeys) {
      // Idempotency: taking the max ensures multiple merges don't "double count"
      this.state[key] = Math.max(this.state[key] || 0, other.state[key] || 0);
    }
  }
}

const myCounter = new GCounter('node-a');
myCounter.increment(); // { 'node-a': 1 }

const incomingState = { 'node-a': 1 };
myCounter.merge(incomingState);
myCounter.merge(incomingState); // Still { 'node-a': 1 }. Safe!

By using Math.max, we ensure that no matter how many times we receive the same state, we never double-count the same increment.

---

2. Commutativity: Order Doesn't Matter

The internet is not a line; it’s a web. If Alice sends update A and Bob sends update B, some nodes will see A then B, while others will see B then A.

The Math: $f(a, b) = f(b, a)$

Commutativity guarantees that the order in which we receive updates doesn't change the final outcome. This is the hardest property to maintain when dealing with operations that feel inherently sequential, like text editing.

The trap of "Last Write Wins" (LWW)

A lot of developers try to achieve commutativity by using timestamps. While technically a "Last Write Wins" register is a CRDT, it's a dangerous one because it relies on wall-clock synchronization, which is notoriously unreliable in distributed systems.

class LWWRegister {
  constructor(value, timestamp) {
    this.value = value;
    this.timestamp = timestamp;
  }

  merge(other) {
    // Commutative: result is the same regardless of who is 'this' and who is 'other'
    if (other.timestamp > this.timestamp) {
      this.value = other.value;
      this.timestamp = other.timestamp;
    }
  }
}

This is commutative because merge(A, B) results in the same state as merge(B, A). However, the "edge case" here is clock skew. If Node A’s clock is 5 minutes behind Node B, Node A’s updates might be permanently ignored, even if they happened "later" in real human time.

When building more complex types (like lists), commutativity usually requires giving every element a unique, sortable identifier that doesn't change.

---

3. Associativity: Grouping Doesn't Matter

Associativity is often the "forgotten" property, but it's what allows CRDTs to scale to complex network topologies (like mesh networks or star-topologies with intermediate relays).

The Math: $f(f(a, b), c) = f(a, f(b, c))$

In plain English: it doesn't matter if I merge Alice and Bob's work first, then merge the result with Charlie—or if I merge Bob and Charlie first and then bring Alice in. The final result is identical.

Why you need this for Batching

If you’re building a system where a central "hub" collects updates from various edge nodes before pushing them to a backup server, you are relying on associativity.

Imagine a sync scenario:
1. Phone syncs with Laptop.
2. Laptop syncs with Cloud.
3. Phone eventually syncs with Cloud.

Without associativity, the Cloud’s final state would depend on whether it heard from the Laptop or the Phone first.

Practical Example: A Set-Based CRDT (OR-Set)

Let's look at an Observed-Remove Set. This is a structure where we can add and remove items. To make it work, we track "additions" and "removals" (tombstones) separately.

class ORSet {
  constructor() {
    this.items = new Map(); // element -> Set of unique IDs (tags)
    this.tombstones = new Set(); // Set of unique IDs that were deleted
  }

  add(element) {
    const tag = Math.random().toString(36).substring(2);
    if (!this.items.has(element)) this.items.set(element, new Set());
    this.items.get(element).add(tag);
  }

  remove(element) {
    const tags = this.items.get(element);
    if (tags) {
      for (let tag of tags) {
        this.tombstones.add(tag);
      }
    }
  }

  merge(other) {
    // Merge additions
    for (const [element, tags] of other.items) {
      if (!this.items.has(element)) this.items.set(element, new Set());
      for (let tag of tags) {
        this.items.get(element).add(tag);
      }
    }
    // Merge tombstones
    for (const tag of other.tombstones) {
      this.tombstones.add(tag);
    }
  }

  get value() {
    const result = new Set();
    for (const [element, tags] of this.items) {
      // An item is only present if it has tags NOT in the tombstone set
      for (let tag of tags) {
        if (!this.tombstones.has(tag)) {
          result.add(element);
          break;
        }
      }
    }
    return result;
  }
}

This merge function is Associative. If you have three different sets (A, B, and C) and you merge them in any grouping, the logic of combining items and tombstones using Set unions remains stable.

---

The "Gotcha": The Cost of Deletion (Tombstones)

You might have noticed something in the ORSet code above: the tombstones set only ever grows.

This is the hidden tax of CRDTs. To maintain Idempotency and Commutativity, you can't simply "delete" data. If you delete a record and a delayed packet arrives later saying "Hey, update this record!", how do you know the record was deleted on purpose rather than just not existing yet?

You don't. So, you keep a tombstone—a marker that says "this ID used to exist, but it's gone now."

If you are building a production CRDT, you must have a strategy for Garbage Collection. This usually involves:
1. Vector Clocks: Tracking which nodes have seen which updates.
2. Pruning: Deleting tombstones once you are 100% certain every node in the cluster has acknowledged the deletion.

Without this, your CRDT state will grow linearly until your application crashes.

Why Bother With All This?

It sounds like a lot of work. Why not just use a database with transactions?

The reason is the CAP Theorem. You cannot have Consistency, Availability, and Partition Tolerance simultaneously. Standard databases favor Consistency. But if you’re building a mobile app that needs to work in a subway tunnel, you are choosing Availability and Partition Tolerance.

When you choose AP, you give up "Strong Consistency" (where everyone sees the same thing at the exact same time) for Strong Eventual Consistency.

Strong Eventual Consistency is the promise that:
* As long as all nodes eventually receive the same updates (in any order)...
* They will all arrive at the exact same state...
* Without any manual conflict resolution.

Those three mathematical properties—Commutativity, Associativity, and Idempotency—are what make that promise possible.

Putting it Together

If you’re about to write your first merge() function, run it through this mental checklist:

1. Check Idempotency: If I call merge(A, A), does A change? (It shouldn't).
2. Check Commutativity: Does merge(A, B) result in the same state as merge(B, A)?
3. Check Associativity: Does merge(merge(A, B), C) result in the same state as merge(A, merge(B, C))?

If you can say "yes" to all three, you aren't just writing code; you're building a resilient distributed system. If not, you’re just building a very complicated way to corrupt a database.

Start small. Start with a GCounter or a Max-Register. Once you feel the power of a merge that "just works" regardless of network chaos, you'll never want to go back to manual if (conflict) { ... } logic again.