Replication Strategies & Consistency

Primary-replica replication, replication lag, read-your-writes consistency, and multi-primary conflict resolution.

4 Exercises
12 Concept Checks
~90 min total
System Design
Session Progress
0 / 4 completed
Exercise 1 🟢 Easy ⏱ 15 min
✓ Completed
Replication Modes
An e-commerce platform has one primary + 2 read replicas. Replication is asynchronous. A customer updates their shipping address. Seconds later they refresh their profile page — it shows the old address. The read was served from a replica with 200ms replication lag. The customer panics thinking their update was lost.
Async Replication Lag
✏️ Write: new address
→ immediate ACK →
Primary DB
→ async →
Replica 1 (lag: 200ms)
Replica 2 (lag: 350ms)
Concept Check — 3 questions
Q1. Async replication's core trade-off: writes are acknowledged quickly, but what is the downside?
AWrite latency is higher because the primary must format data for the replica before acknowledging
BPotential data loss on primary failure and stale reads from replicas that haven't caught up
CComplex distributed transactions are required for every write
DEach replica requires 2× the storage of the primary
Q2. Synchronous replication: the primary waits for at least 1 replica to confirm before acknowledging the write. What is the main downside?
AData loss on primary failure — the replica might not have the latest data
BStale reads are still possible — the replica can lag behind the primary
CHigher write latency — every write must wait for a cross-network round-trip to the replica before returning to the client
DNo fault tolerance — if the synchronous replica fails, writes must stop
Q3. Semi-synchronous replication (MySQL feature): primary waits for 1 of N replicas to acknowledge. What does this provide?
AZero write latency — the primary never waits for any replica
BDurability guarantee with moderate latency — 1 confirmed copy ensures no data loss even if the primary crashes immediately after
CZero replication lag — all replicas are always current
DAutomatic conflict resolution for concurrent writes
Async replication: primary ACKs immediately. If primary crashes before the write propagates to replicas, that write is permanently lost. Semi-sync (MySQL): wait for 1 replica, then fall back to async if the replica doesn't respond within a timeout (default 10s). This is the practical compromise: 1 confirmed copy for durability, minimal latency impact for normal operation.
Open Design Challenge
1
Design the failover process when the primary crashes in an async replication setup. Which replica becomes the new primary? How do you detect split-brain and prevent data divergence?
2
For a payment system, should replication be sync or async? Justify your choice considering the consequences of data loss vs latency increase.
3
Design monitoring for replication lag: what metric do you track, what is the alert threshold, and what action do you take when lag exceeds 1 second?
Concept score: 0/3
Exercise 2 🟢 Easy ⏱ 20 min
✓ Completed
Read-Your-Writes Consistency
Social network: a user posts a tweet (write hits the primary). Seconds later they view their own profile (read served from a replica 500ms behind). The tweet is missing. The user refreshes several times — it eventually appears. Users file bug reports: "my tweet disappeared." This is a read-your-writes violation.
Read-Your-Writes Violation
👤 User posts tweet
→ write →
Primary (has tweet)
→ user reads profile →
Replica (500ms behind, no tweet)
User sees missing tweet 😡
Concept Check — 3 questions
Q1. To implement read-your-writes consistency, the simplest approach is:
ADisable all read replicas — every read must hit the primary
BSticky reads to primary: route reads to the primary for a short window (e.g., 5 seconds) after the same user makes a write
CSwitch all replication to synchronous mode globally
DStore the write result in the client's local browser cache
Q2. The session-token approach for read-your-writes: store the last-write timestamp in the user's session. Reads then check:
AOnly the user's IP address to route to the nearest replica
BThe global database version number for the current schema
CWhether the replica's replication position is ahead of the timestamp in the session; if not, route to the primary instead
DThe user's last login timestamp to determine session freshness
Q3. Monotonic reads consistency guarantees what property for a user's reads?
AA user never sees data older than what they previously read — no going backwards in time between successive requests
BAll reads are served from the primary to guarantee freshness
CAll writes are totally ordered so reads always reflect causal order
DAll replicas always return the same value at any point in time
Sticky-primary approach: track last_write_at in the user session. For the next N seconds (or until replica catches up), route that user's reads to primary. Session-token approach: store the replication position (binlog offset / LSN) at write time. When reading, pick a replica that is at or ahead of that position. Monotonic reads: use sticky routing to the same replica within a session — if the replica doesn't go backwards, reads are monotonic.
Open Design Challenge
1
A user posts a tweet from their phone and then opens a laptop browser. Both devices should see the tweet immediately. How do you implement cross-device read-your-writes consistency?
2
Design a read routing layer that checks replica replication lag before routing. If lag exceeds 200ms, route to primary. Sketch the architecture and the lag check mechanism.
3
Read-your-writes increases load on the primary for N seconds after every write. At 1M writes/hour with a 5-second primary-sticky window, estimate the additional primary read load vs baseline.
Concept score: 0/3
Exercise 3 🟡 Medium ⏱ 25 min
✓ Completed
Multi-Primary (Active-Active) Conflicts
A globally distributed app has primary databases in US and EU. Both accept writes. A user concurrently changes their email from US (to alice@us.com) and EU (to alice@eu.com) at the same millisecond. Both primaries accept the write. During synchronization, US and EU have conflicting email values for the same user — and there's no clear winner.
Multi-Primary Write Conflict
US Primary: email=alice@us.com
←→ conflict ←→
EU Primary: email=alice@eu.com
→ need resolution →
LWW / CRDT / Application logic
Concept Check — 3 questions
Q1. Last-Write-Wins (LWW) conflict resolution uses what to determine which write wins?
AThe longest value — alice@us-company-domain.com beats alice@eu.com
BWall clock timestamp — the write with the higher timestamp wins
CAlphabetical ordering — alice@eu.com comes before alice@us.com
DThe first write received by any node in the cluster wins
Q2. LWW's fundamental problem in a distributed system is:
ATimestamps take too much storage overhead in each record
BComparing timestamps is too computationally expensive at scale
CClock skew — clocks on different machines may differ by milliseconds, so the "later" timestamp may actually belong to the write that happened earlier in real causal time
DLWW requires a central coordinator that becomes a bottleneck
Q3. Application-level conflict resolution (CouchDB, DynamoDB): the application receives both conflicting versions and decides which to keep. This approach is most appropriate when:
AData is simple key-value pairs where the latest value always wins
BDomain-specific merge logic is needed — e.g., merging shopping carts, taking max of counters, or using CRDT data types
CIt is always the best approach — applications always know best
DOnly for append-only data where merging is trivial
LWW silently discards writes — the "losing" email is simply gone. This causes data loss without any error. Clock skew of even 1ms between US and EU nodes can flip the winner incorrectly. Hybrid Logical Clocks (HLC) partially solve clock skew by combining physical and logical time. For shopping carts: merge by union (take all items from both versions). For email: present conflict to user. CRDTs (Conflict-free Replicated Data Types) are data structures that always converge automatically (counters, sets, etc.).
Open Design Challenge
1
Design a conflict resolution strategy for a shared document editor (Google Docs-style) where two users in different regions edit the same paragraph simultaneously. What data structure would you use?
2
Implement a version vector (vector clock) for the US/EU email conflict: what does the vector look like before and after the conflict? How does it help determine causality?
3
Some conflicts cannot be auto-resolved (e.g., two users book the last flight seat simultaneously). Design a "conflict surfacing" system that presents these conflicts to users and tracks resolution status.
Concept score: 0/3
Exercise 4 🔴 Hard ⏱ 30 min
✓ Completed
Replication Topologies
A global fintech platform runs in 5 regions. Each region must survive independently. Currently: hub-and-spoke topology with a single primary in US and 4 replica regions. When the US region goes down, a 15-minute failover window causes a global service outage while a new primary is elected and DNS propagates. You need to eliminate this downtime.
Hub-and-Spoke vs Active-Active
Hub-Spoke: US Primary → EU Replica, APAC Replica, SA Replica (read-only)
Active-Active: US ↔ EU ↔ APAC ↔ SA (all accept writes)
Concept Check — 3 questions
Q1. Active-active multi-region replication reduces downtime compared to hub-and-spoke by:
AUsing faster hardware in each region to reduce failover duration
BEach region accepts writes independently — if the US goes down, traffic simply routes to other primaries with no failover procedure needed
CUsing synchronous replication so all regions are always in sync
DHaving more replica nodes so election is faster
Q2. A daisy-chain replication topology (A → B → C → D) carries what specific risk?
AFaster change propagation — each hop accelerates delivery to the next node
BLess total replication lag because each node is closer to the next
CCumulative lag — each hop adds delay; D's lag equals the sum of A→B, B→C, and C→D lags combined
DAutomatic conflict resolution at each intermediary node
Q3. Galera Cluster implements synchronous multi-master replication using write-set certification. Certification-based conflict detection means:
AWrites are digitally signed with SSL certificates before being applied
BBefore applying, each node validates the write-set has no key conflicts with concurrent in-flight transactions, rolling back conflicting transactions
CNodes vote by majority on which writes to accept, rejecting minority writes
DWrites are applied in random order and reconciled afterward
Active-active eliminates failover but introduces write conflicts (Exercise 3). Hub-and-spoke eliminates conflicts but requires failover on primary failure. For fintech: strong consistency often requires a single primary per data partition — use geography-based sharding (US users → US primary, EU users → EU primary) for active-active without cross-region conflicts on the same record. Galera certification: the write-set contains modified row keys; if another node committed a write to the same key concurrently, the later one is rolled back.
Open Design Challenge
1
Design geographic sharding to enable active-active without write conflicts: US users write to the US primary, EU users write to the EU primary. How do you handle a user who travels from US to EU?
2
Design the automated failover process for hub-and-spoke: when should the system auto-promote a replica to primary vs require manual intervention? What signals trigger auto-promotion?
3
A split-brain scenario: US primary and EU replica both think they are the primary (network partition). Design a fencing mechanism (STONITH — Shoot The Other Node In The Head) that ensures only one is active.
Concept score: 0/3