Day 9 — Week 2

Replication Strategies & Consistency

Learn how databases keep copies in sync across nodes, manage replication lag, and handle the trade-offs between consistency, availability, and performance.

Primary-Replica Replication Lag Read-Your-Writes Failover & Split-Brain Synchronous vs Async
Key Concepts
🔄
Async Replication
Primary acknowledges writes immediately, then replicates to replicas in the background. Low latency writes but replicas may serve stale data if they lag behind the primary.
🔒
Synchronous Replication
Primary waits for at least one replica to confirm before acknowledging the write. Zero replication lag but adds latency to every write. PostgreSQL supports semi-sync mode.
⏱️
Replication Lag
Time delay between a write on primary and visibility on a replica. Under normal load: 10-100ms. Under high write load: seconds or minutes. Critical to monitor for read-after-write consistency.
🧭
Read-Your-Writes
After a user writes data, they must always read their own write back. Solved by routing their reads to primary for 1 second after a write, or tracking a monotonic "write timestamp" per user.
Interactive Simulation — Replication Lag Visualizer

Simulate writes to primary and observe how replicas fall behind under lag. Watch what happens when you read from a lagging replica.

🟦 Primary
Leader — accepts writes
v0: (empty)
Replica-A
In sync ✓
v0: (empty)
Lag0ms
Replica-B
In sync ✓
v0: (empty)
Lag0ms
Simulation ready. Click "Write to Primary" to begin.
Architecture — Primary-Replica Topology
Write path:
App Server
Write Router
Primary DB
→ async
WAL stream
Replica-A
/
Replica-B
Read path:
App Server
Read Router
round-robin
Replica-A
or
Replica-B
ConcernAsync ReplicationSync Replication
Write latencyLow (fire and forget)Higher (+RTT to replica)
DurabilityData loss on primary crashZero data loss (at least 1 replica has it)
Read stalenessPossible (lag can be seconds)None (replica is always current)
AvailabilityHigh (primary can write even if replicas down)Lower (primary blocks if sync replica is down)
Best forSocial feeds, analytics, low-criticality readsFinancial data, inventory, anything requiring strong consistency
Technology Decision
PatternGuaranteeTrade-offUse When
Async replicationEventual consistencyRisk of reading stale dataRead replicas for analytics, dashboards
Semi-sync replicationAt-least-one replica has writeSlightly higher write latencyMySQL semi-sync: no data loss on failover
Sync replicationStrong consistencyAvailability reduced if replica failsFinancial txns, config stores
Multi-primaryEventual (with conflict resolution)Conflict complexityMulti-region active-active
Code Example — PostgreSQL Replication + Read Routing
# postgresql.conf on primary
wal_level = replica
max_wal_senders = 5
wal_keep_size = 1024  # MB
synchronous_commit = on  # change to 'remote_write' for semi-sync

# recovery.conf on replica (PostgreSQL 12+: postgresql.conf)
primary_conninfo = 'host=primary port=5432 user=replicator'
recovery_target_timeline = 'latest'

# Python: route writes to primary, reads to replica
import psycopg2
from contextlib import contextmanager
import time

PRIMARY_DSN = "postgresql://user:pass@primary:5432/db"
REPLICA_DSN = "postgresql://user:pass@replica:5432/db"

# Track per-user write timestamps for read-your-writes
user_write_times: dict[str, float] = {}
CONSISTENCY_WINDOW = 2.0  # seconds

def get_read_conn(user_id: str):
    """Route reads: primary if user wrote recently, else replica."""
    last_write = user_write_times.get(user_id, 0)
    if time.time() - last_write < CONSISTENCY_WINDOW:
        return psycopg2.connect(PRIMARY_DSN)  # read-your-writes
    return psycopg2.connect(REPLICA_DSN)

def write(user_id: str, query: str, params=None):
    """All writes go to primary; record timestamp for consistency."""
    with psycopg2.connect(PRIMARY_DSN) as conn:
        conn.cursor().execute(query, params)
        conn.commit()
    user_write_times[user_id] = time.time()

def read(user_id: str, query: str, params=None):
    """Reads use replica unless user wrote recently."""
    conn = get_read_conn(user_id)
    with conn:
        cur = conn.cursor()
        cur.execute(query, params)
        return cur.fetchall()
Quiz
1. In asynchronous replication, what happens if the primary crashes immediately after acknowledging a write?
2. What is "read-your-writes" consistency and why is it important?
3. What is "split-brain" in a replicated database system?
4. PostgreSQL streaming replication uses WAL. What does WAL stand for and what is its primary role?
5. When should you use synchronous over asynchronous replication?