What is Vector Clocks & CRDTs in System Design?

Consistencymedium

Vector Clocks & CRDTs

Vector clocks track causality between events in distributed systems. CRDTs (Conflict-free Replicated Data Types) are data structures designed to merge concurrent updates without conflicts.

Memory anchor

Vector clocks = each friend stamps their letter with 'I've seen messages 1-5 from Alice, 1-3 from Bob.' CRDTs = LEGO bricks designed so no matter what order you snap them together, you get the same castle.

Expected depth

Vector clocks: each node maintains a counter per node in the system. On send, increment own counter and attach the vector. On receive, merge by taking the max of each element. If vector A is strictly less than vector B in all positions, A happened-before B. If neither dominates, the events are concurrent — a conflict resolution strategy is needed. CRDTs: data types where all concurrent updates can be merged commutatively and associatively. Examples: G-Counter (grow-only, merge by max), PN-Counter (add/subtract, two G-Counters), OR-Set (add/remove sets with add-wins semantics), LWW-Register (last write wins by timestamp).

Deep — senior internals

Vector clocks scale poorly with the number of nodes (O(N) metadata per message). Dotted version vectors (used by Riak) reduce this overhead. DynamoDB originally used vector clocks but moved to LWW with a 'last writer wins' timestamp due to client complexity in resolving conflicts. CRDTs are the principled solution to automatic conflict resolution. They're used in: Redis (CRDT-based data types in Redis Enterprise), Riak, collaborative editing (Google Docs uses operational transformation — a related concept), distributed shopping carts (add-wins OR-Set). The key CRDT insight: if all operations commute and are idempotent, replica state can be merged in any order and always converge. The tradeoff: CRDT semantics don't always match business intent (a shopping cart OR-Set never forgets an add — a removed item might reappear after a network partition heal).

🎤Interview-ready answer

I'd use CRDTs for data that naturally supports merge semantics: counters (view counts, like counts), sets (tags, reactions), and presence indicators. For a shopping cart, I'd use an OR-Set CRDT so concurrent adds from multiple devices are preserved. For like counts, a PN-Counter. For data that doesn't naturally merge (e.g., a user's shipping address), LWW with explicit conflict detection is simpler — surface the conflict to the user rather than silently picking a winner.

⚠Common trap

Assuming CRDTs eliminate all consistency concerns. CRDTs guarantee eventual convergence of the data structure state, but the business semantics of the merged state may still be wrong. An OR-Set shopping cart that preserves all adds from a network partition may show a doubled quantity. Application-level validation is still needed.