loke.dev
Header image for The Night My UI Traveled Back in Time: How I Finally Mastered Causal Consistency for Local-First Apps

The Night My UI Traveled Back in Time: How I Finally Mastered Causal Consistency for Local-First Apps

Why simple timestamps are a recipe for data corruption in collaborative apps and how I traded them for the mathematical certainty of version vectors.

· 7 min read

Imagine two users, Alice and Bob, collaborating on a shared notepad in a local-first application. Alice writes "The weather is," and a second later, Bob—seeing Alice's text—appends "sunny." In a perfectly synchronized world, the state is "The weather is sunny." But in the chaotic reality of distributed systems, Alice’s laptop goes through a tunnel, her "The weather is" packet is delayed, and Bob’s "sunny" reaches the server or other peers first. If your system relies on simple system timestamps to order these events, Bob’s update might be discarded because its "parent" state doesn't exist yet, or worse, Alice’s delayed packet might overwrite Bob’s later contribution because Alice’s system clock happened to be three seconds fast.

This isn't just a sync lag issue; it’s a fundamental violation of causal consistency. When we build local-first apps, we aren't just syncing data; we are synchronizing human intent. If Bob's action was caused by seeing Alice's action, that relationship must be preserved regardless of when the packets physically arrive at their destination.

The Fatal Flaw of Wall-Clock Timestamps

Most developers start with Date.now() or new Date().toISOString(). It feels intuitive. However, in a distributed environment, the system clock is a liar. Between NTP drifts, leap seconds, and users manually changing their timezones, you cannot guarantee that t1 < t2 implies that event 1 happened before event 2.

Even if clocks were perfectly synchronized, network latency is non-linear. Look at this common "Last Write Wins" (LWW) implementation that fails silently:

type Register<T> = {
  value: T;
  timestamp: number;
};

function merge<T>(local: Register<T>, remote: Register<T>): Register<T> {
  // The classic LWW trap
  if (remote.timestamp > local.timestamp) {
    return remote;
  }
  return local;
}

// Scenario:
// Alice (Clock: 1002) sets value to "A"
// Bob (Clock: 1000) sets value to "B" after seeing "A"
// If Alice's clock is fast, Bob's "B" (the intentional successor) 
// will be rejected by anyone seeing Alice's "A" first.

In this snippet, Bob's intent is ignored because his hardware clock is lagging behind Alice's. This is how "time travel" bugs happen: the UI reflects a state that shouldn't exist yet or reverts to an older state because a "newer" timestamp arrived from a "faster" clock.

Logical Clocks: Ordering Without a Watch

To fix this, we have to stop looking at the sun and start looking at the sequence of events. Leslie Lamport introduced the Lamport Clock to solve this. Instead of milliseconds, we use a simple integer.

1. Each process maintains a counter initialized to 0.
2. Increment the counter before performing an operation.
3. When sending a message, include the counter.
4. When receiving a message, set your local counter to max(local_counter, message_counter) + 1.

class LamportClock {
  private counter: number = 0;

  tick() {
    this.counter++;
    return this.counter;
  }

  witness(remoteValue: number) {
    this.counter = Math.max(this.counter, remoteValue) + 1;
  }

  get time() {
    return this.counter;
  }
}

Lamport clocks give us a partial ordering. If event A happened before event B in a single thread of execution, A's clock will be less than B's. If Bob sees Alice's message and then replies, Bob's witness() call ensures his clock is strictly greater than Alice's.

But Lamport clocks have a weakness: if clockA < clockB, we don't actually know if A caused B. They might have happened simultaneously on different devices that hadn't talked to each other yet. To achieve true causal consistency—where we can distinguish between "A happened before B" and "A and B happened concurrently"—we need Version Vectors.

Mastering Version Vectors

A Version Vector is essentially a map of every "node" (device or user) in the system and the latest sequence number seen from that node. It represents a "frontier" of knowledge.

If Alice has a vector {Alice: 5, Bob: 2}, she is saying: "I have seen 5 operations from myself and 2 from Bob."

If she receives a message from Bob with the vector {Alice: 5, Bob: 3}, she knows this message is the *immediate next step* after what she already knows. If she receives {Alice: 5, Bob: 4}, she knows she missed a message (Bob's 3rd operation) and must buffer the new one until the missing piece arrives.

Here is a functional implementation of a Version Vector in TypeScript:

type VectorMap = Record<string, number>;

class VersionVector {
  private versions: VectorMap = {};

  constructor(initial?: VectorMap) {
    this.versions = initial ? { ...initial } : {};
  }

  // Increment the version for a specific node (e.g., on local change)
  increment(nodeId: string): void {
    this.versions[nodeId] = (this.versions[nodeId] || 0) + 1;
  }

  // Update vector after seeing another vector
  merge(remote: VersionVector): void {
    for (const [nodeId, version] of Object.entries(remote.versions)) {
      this.versions[nodeId] = Math.max(this.versions[nodeId] || 0, version);
    }
  }

  // Check the relationship between two vectors
  static compare(a: VersionVector, b: VersionVector): 'BEFORE' | 'AFTER' | 'CONCURRENT' | 'EQUAL' {
    let aHasGreater = false;
    let bHasGreater = false;

    const allKeys = new Set([...Object.keys(a.versions), ...Object.keys(b.versions)]);

    for (const key of allKeys) {
      const va = a.versions[key] || 0;
      const vb = b.versions[key] || 0;

      if (va > vb) aHasGreater = true;
      if (vb > va) bHasGreater = true;
    }

    if (aHasGreater && bHasGreater) return 'CONCURRENT';
    if (aHasGreater) return 'AFTER';
    if (bHasGreater) return 'BEFORE';
    return 'EQUAL';
  }

  getSnapshot(): VectorMap {
    return { ...this.versions };
  }
}

Applying Causality to Local-First Operations

To make use of this, every operation (or "op") in your system needs to carry the Version Vector of the state it was created upon. This is the "causal context."

Imagine a document store. Instead of just sending the new text, we send an operation object:

interface Operation {
  id: string;
  nodeId: string;
  type: 'INSERT' | 'DELETE' | 'EDIT';
  data: any;
  // This is the state of the world when the user performed the action
  context: VectorMap; 
}

When Bob receives Alice's operation, he checks the context. If his current local Version Vector matches or exceeds the context provided in the operation, he can safely apply it. If the operation’s context contains a version number higher than what Bob has for a particular node, he knows he’s seeing a "message from the future"—an operation that depends on data he hasn't received yet.

This allows us to build a Causal Buffer.

class CausalDispatcher {
  private localVector: VersionVector = new VersionVector();
  private buffer: Operation[] = [];

  constructor(private onApply: (op: Operation) => void) {}

  processRemoteOp(op: Operation) {
    if (this.isReady(op.context)) {
      this.applyOp(op);
      this.checkBuffer();
    } else {
      this.buffer.push(op);
    }
  }

  private isReady(context: VectorMap): boolean {
    const local = this.localVector.getSnapshot();
    
    for (const [nodeId, version] of Object.entries(context)) {
      // We must have at least seen the version specified in the context
      if ((local[nodeId] || 0) < version) {
        return false;
      }
    }
    return true;
  }

  private applyOp(op: Operation) {
    this.onApply(op);
    // After applying, we update our local vector to include this op
    this.localVector.increment(op.nodeId);
  }

  private checkBuffer() {
    // Re-scan buffer to see if any pending ops are now ready
    // This would ideally be an iterative loop until no more ops can be applied
  }
}

The "Concurrent" Headache

What happens when VersionVector.compare(a, b) returns 'CONCURRENT'?

This is the fork in the road. Causal consistency ensures that if A caused B, everyone sees A before B. But it does *not* tell you how to order two events that happened independently (e.g., Alice and Bob both edited the same sentence while offline).

To handle concurrency without a central server "deciding" who was right, you have two main paths:

1. Conflict-free Replicated Data Types (CRDTs): These are data structures (like LWW-Element-Set or Automerge/Yjs) that use mathematical properties to ensure that no matter what order concurrent operations are received, every node ends up with the same state.
2. Deterministic Tie-breaking: If two operations are concurrent, you use a secondary sort key, like a Lexicographical sort of the Node ID. It’s arbitrary, but as long as it’s consistent across all devices, the UI won't flicker or diverge.

function tieBreaker(opA: Operation, opB: Operation): number {
  const relation = VersionVector.compare(
    new VersionVector(opA.context), 
    new VersionVector(opB.context)
  );

  if (relation === 'AFTER') return 1;
  if (relation === 'BEFORE') return -1;

  // They are concurrent - use Node ID as a stable tie-breaker
  if (opA.nodeId > opB.nodeId) return 1;
  if (opA.nodeId < opB.nodeId) return -1;
  
  return 0;
}

Why Bother?

You might think, "This is a lot of code just to avoid a 100ms clock drift." But in the local-first world, users can be offline for days.

Imagine Alice goes offline on Friday, makes 50 edits, and syncs on Monday. Meanwhile, Bob made 10 edits on Saturday. Without Version Vectors, Alice's sync on Monday might look like a massive dump of data that overwrites everything Bob did, or vice versa. By tracking causality, the system can say: "I see Bob did X, and Alice did Y. They didn't know about each other's work, so I will merge them."

The Edge Cases: Pruning the Vector

One thing I learned the hard way: Version Vectors grow. In a system with thousands of ephemeral users, that VectorMap becomes a memory leak.

In a production environment, you have to implement Vector Pruning or use a Dotted Version Vector. This involves identifying nodes that haven't been active for a long time and "merging" their history into a common baseline. If you’re building a small-group collaborative app (like a document editor for a team), the standard Version Vector is fine. If you’re building a global-scale social network, you’ll need to look into more advanced structures like Merkle Search Trees or Hybrid Logical Clocks (HLC).

Moving Beyond the "Time" Illusion

Mastering causal consistency was the moment my local-first apps stopped feeling like "web apps with a cache" and started feeling like robust, professional software. It’s the difference between a UI that "glitches" and a UI that "reasons."

When you stop trusting the system clock and start tracking the flow of information through Version Vectors, you aren't just fixing bugs. You are building a system that respects the reality of distributed human collaboration. You are ensuring that when Bob says "sunny" because Alice said "The weather is," that connection is preserved for eternity—or at least until the next git push.