Why distributed systems can't trust wall clocks. Lamport timestamps, vector clocks, and how Google Spanner uses atomic clocks.
Each server has its own hardware clock. NTP synchronizes clocks but only to within 100ms–10ms accuracy typically. Two servers may disagree on the time by enough to completely reverse the causal order of events.
A user posts a comment (Node A, t=100ms), then edits it (Node B, t=99ms due to clock skew). The system records the edit BEFORE the original post. Sorted by timestamp, you'd see the edit come first — violating causality.
Each node maintains a counter L. Three rules:
Key insight: if event A happened-before event B, then L(A) < L(B). But the converse is NOT true — L(A) < L(B) does NOT mean A happened before B. Lamport clocks don't detect concurrent events.
Three nodes (A, B, C) each with their own Lamport counter. Simulate sends and receives.
Each node maintains a vector V[n] (one counter per node). Rules:
A → B (A happened-before B) iff: V(A)[i] ≤ V(B)[i] for all i, and at least one is strictly less.
Concurrent (A ∥ B) iff: neither A → B nor B → A. Used by DynamoDB, Riak to detect conflicts.
A's last event [3,0,0] and B's last event [2,3,0]: A has 3>2 in position 0, B has 3>0 in position 1. Neither dominates — concurrent!
Google's Spanner uses GPS receivers and atomic clocks in every datacenter. TrueTime API returns [earliest, latest] — a bounded interval of true wall-clock time. Spanner guarantees: if commit of T1 happens before start of T2, then T1.timestamp < T2.timestamp.
HLC = max(wall_time, lamport_counter). Advances monotonically, preserves causality, stays close to wall clock. CockroachDB uses HLC for transaction timestamps — no atomic clocks needed, just NTP + a logical offset that accounts for skew.
from threading import Lock
from typing import Optional
class LamportClock:
"""Thread-safe Lamport logical clock."""
def __init__(self):
self._time = 0
self._lock = Lock()
def tick(self) -> int:
"""Local event — increment and return new time."""
with self._lock:
self._time += 1
return self._time
def send(self) -> int:
"""Before sending a message — tick and return timestamp to attach."""
return self.tick()
def receive(self, msg_timestamp: int) -> int:
"""On receiving a message — update to max + 1."""
with self._lock:
self._time = max(self._time, msg_timestamp) + 1
return self._time
class VectorClock:
"""Vector clock for causality tracking across N nodes."""
def __init__(self, node_id: str, all_nodes: list[str]):
self.node_id = node_id
self.vector = {n: 0 for n in all_nodes}
self._lock = Lock()
def tick(self) -> dict:
with self._lock:
self.vector[self.node_id] += 1
return dict(self.vector)
def send(self) -> dict:
return self.tick()
def receive(self, incoming: dict) -> dict:
with self._lock:
for node, ts in incoming.items():
self.vector[node] = max(self.vector.get(node, 0), ts)
self.vector[self.node_id] += 1
return dict(self.vector)
def happens_before(self, a: dict, b: dict) -> bool:
"""Returns True if a -> b (a happened before b)."""
all_lte = all(a.get(n, 0) <= b.get(n, 0) for n in set(a)|set(b))
any_lt = any(a.get(n, 0) < b.get(n, 0) for n in set(a)|set(b))
return all_lte and any_lt
def concurrent(self, a: dict, b: dict) -> bool:
return not self.happens_before(a, b) and not self.happens_before(b, a)
| Mechanism | Detects Concurrent? | Overhead | Use Case |
|---|---|---|---|
| Wall Clock (NTP) | No — unreliable | None | Logging, approximate ordering |
| Lamport Timestamps | No (partial order only) | O(1) int per message | Distributed mutex, total ordering |
| Vector Clocks | Yes | O(N) per message | Conflict detection (DynamoDB, Riak) |
| Hybrid Logical Clocks | Partial | O(1), stays near wall time | CockroachDB, YugabyteDB |
| TrueTime (atomic) | Yes (bounded) | Hardware cost, 1–7ms wait | Google Spanner — serializable globally |
1. Node A has Lamport clock L=5. It receives a message with timestamp T=10. What is A's clock after receiving?
2. Vector clocks V(A) = [3,1,0] and V(B) = [2,3,0]. What is their relationship?
3. Why does Google Spanner "commit-wait" for the TrueTime uncertainty interval?
4. A Lamport clock shows event A has timestamp 5 and event B has timestamp 3. What can you conclude?
5. DynamoDB uses vector clocks (dotted version vectors) to detect conflicts. When two vector clocks are concurrent, what does DynamoDB do?
You now understand distributed time. Next: Failure Modes & Fault Tolerance.