Day 18
Clocks, Ordering & Causality
Physical clocks lie in distributed systems. Learn Lamport timestamps, vector clocks, and how Google Spanner's TrueTime solves global ordering without a central coordinator.
Tracing Lamport Timestamps Through a 3-Node Message Exchange
Three nodes (A, B, C) exchange messages in the following sequence. Each node starts with a local clock of 0. Trace the Lamport timestamp algorithm (on send: increment local clock; on receive: local = max(local, received) + 1) through this sequence: (1) A sends message m1 to B at A's clock=1. (2) B receives m1 and sends m2 to C. (3) C receives m2. (4) A sends m3 to C (independently, no synchronization with step 2). (5) C receives m3.
Tasks
- Trace the exact Lamport timestamp on each event: A sends m1, B receives m1, B sends m2, C receives m2, A sends m3, C receives m3. Show each node's clock value at each step.
- After all events, what is C's final local clock value? Is event "C receives m2" guaranteed to have a lower timestamp than "C receives m3"? Why or why not?
- Explain the key limitation of Lamport clocks: if event X has timestamp 10 and event Y has timestamp 15, can you conclude X happened-before Y? What can you conclude?
- Give a concrete distributed systems example where relying on Lamport timestamps alone for ordering would produce an incorrect result โ and state what stronger mechanism (vector clocks or consensus) would be needed instead.
Vector Clock Conflict Detection โ Concurrent vs Causal Events
A distributed key-value store replicates data across 3 nodes (N1, N2, N3). Each value carries a vector clock [n1, n2, n3] representing how many updates each node has seen. The store uses vector clocks to detect whether two versions of the same key are causally related (one happened-before the other) or concurrent (neither happened-before the other, indicating a conflict). Given four version vectors for key "user:42:profile", determine their relationships.
Tasks
- Given versions: VA=[3,1,0], VB=[2,2,0], VC=[3,2,0], VD=[1,0,2] โ for each pair (VA vs VB, VA vs VC, VB vs VC, VA vs VD), determine: is one a causal ancestor of the other, or are they concurrent? Show your comparison logic.
- Explain the happened-before rule for vector clocks: VC_A < VC_B if and only if every component of VC_A is โค the corresponding component of VC_B and at least one component is strictly less. Apply this to each pair above.
- For each concurrent pair you identified, explain what the distributed store must do: store both versions (siblings), present them to the application for conflict resolution, or apply a deterministic merge (LWW, add-wins)?
- Explain why Amazon DynamoDB (and Dynamo paper) stores all concurrent versions as siblings rather than resolving automatically โ and what the application must do to reconcile them during a read.
Google Spanner's TrueTime โ Externally Consistent Transactions
Google Spanner achieves externally consistent (a stronger form of serializable) distributed transactions across datacenters worldwide without a single coordinator. It accomplishes this using TrueTime โ an API that returns a time interval [earliest, latest] rather than a point-in-time, with a guaranteed uncertainty bound (epsilon) of typically 1โ7ms. This uncertainty bound is the key to Spanner's commit-wait protocol. Understand how TrueTime makes global ordering possible.
Tasks
- Explain what "external consistency" means in terms of transaction ordering: if transaction T1 commits before T2 begins (in real time), T2's commit timestamp must be greater than T1's commit timestamp โ why is this stronger than serializability alone?
- Describe the TrueTime commit-wait protocol: when a Spanner transaction is ready to commit, it receives a timestamp s = TT.now().latest. It then waits until TT.now().earliest > s before releasing the commit. Why does this wait guarantee external consistency?
- Calculate the performance cost: if epsilon (max TrueTime uncertainty) is 7ms, how much latency does commit-wait add to every write transaction? How does Google reduce epsilon to minimize this penalty?
- Explain the role of GPS receivers and atomic clocks in Spanner's data centers โ why are these used instead of NTP synchronization, and what happens to Spanner's correctness if the atomic clock fails in one datacenter?
Distributed Changelog Preserving Causal Order Across 5 Datacenters
A global social platform maintains a distributed activity changelog (user posts, likes, comments) across 5 datacenters (US, EU, APAC-East, APAC-West, SA). Users can write to their nearest datacenter. The business requirement is that causally related events must appear in causal order in the changelog โ if a user posts a comment replying to a post, the original post must appear before the reply in every datacenter's changelog, even if the reply was written in a different datacenter 50ms after the post. Total write volume: 2 million events/minute.
Tasks
- Explain why physical timestamps alone fail this requirement: give a concrete scenario where a reply has a lower physical timestamp than the post it replies to (hint: clock skew between datacenters).
- Design a causal metadata propagation scheme: when a user reads post P in US-DC (tagged with vector clock VP) and writes reply R in EU-DC, how does R carry VP as its causal dependency, and how does EU-DC use this to order R after P?
- Design the cross-datacenter replication protocol: when US-DC emits events to EU-DC via Kafka, how does EU-DC's changelog consumer hold back events until all their causal dependencies have been received? Describe the "causal barrier" mechanism.
- Calculate the storage overhead of attaching vector clocks (5 datacenters x 8 bytes each = 40 bytes per event) to 2 million events/minute, and propose a compression scheme (e.g., delta encoding, causal compression) to reduce this overhead for high-volume events.