Day 29 ยท Week 5

Advanced Architectural Patterns

Eight production-grade patterns: CRDT for conflict-free replication, load shedding under overload, backpressure in streaming systems, Circuit Breakers, Bulkhead, Saga, Outbox Pattern, and Sidecar.

8
Patterns
~4h
Study Time
3
Simulations
5
Quizzes
CRDT Load Shedding Backpressure Circuit Breakers Bulkhead Saga Outbox Pattern Sidecar

Eight Patterns That Appear in Every Senior Interview

These patterns solve real distributed systems problems. Each one represents a specific class of failure or complexity that naive architectures cannot handle.

๐Ÿงฎ

CRDTs

Conflict-free Replicated Data Types โ€” data structures that merge automatically without conflicts. No coordination or consensus needed. Used by Figma (collaborative editing), Riak, Redis.

๐Ÿชฃ

Load Shedding

Deliberately dropping low-priority work to protect system under overload. Prioritize P0 (payments, auth) over P2 (analytics). A degraded experience beats total unavailability.

๐Ÿ”ƒ

Backpressure

Producer slows down when consumer can't keep up โ€” prevents unbounded queues. Kafka pull model is natural backpressure: consumers control their own poll rate and lag is visible.

๐Ÿšข

Bulkhead

Isolate failures โ€” like ship bulkhead compartments that prevent one leak from sinking the vessel. Separate thread pools per service type so a slow vendor doesn't stall everything else.

G-Counter CRDT โ€” Distributed Counter

Increment counters on individual nodes independently. Merge shows how CRDT resolves distributed increments: merge = max(each position). The global total is always correct after merge.

G-Counter CRDT โ€” 3 Nodes

Each node maintains a vector [A, B, C]. Merge takes the max of each position โ€” commutative, associative, idempotent. No coordination needed.

0
Global count after merge โ€” sum of all positions
Ready. Increment nodes, then merge to see convergence...

Load Shedding โ€” Priority-Based Traffic Control

Load Shedder โ€” 1000 req/sec Capacity

Move the slider to simulate incoming traffic. Threshold is 800 req/sec (80% capacity). Above threshold: P2 (analytics) drops first, then P1 (standard features).

100Threshold: 800Capacity: 10003000
P0 โ€” Payments & Auth (never shed)โ€”
P1 โ€” Core features (shed if critical overload)โ€”
P2 โ€” Analytics & Recommendations (first to shed)โ€”

Backpressure Patterns

Backpressure prevents fast producers from overwhelming slow consumers. Without it, queues grow unboundedly until the system runs out of memory.

PatternHow It WorksWhen to Use
Bounded queuesReject or block when queue is full (back-pressure propagates upstream)Fast producers, slow consumers in same system
Rate limiting producerSignal producer to slow down via 429 or flow controlStreaming/async systems with explicit producer control
Pull model (Kafka)Consumer pulls at its own pace โ€” lag is visible metricEvent streaming, durable log processing
Circuit breakerStop sending to overwhelmed service temporarilySynchronous RPC calls between services
kafka_backpressure.py
# Kafka consumer implementing backpressure
consumer = KafkaConsumer(
    'events',
    max_poll_records=10,        # Process 10 at a time
    max_poll_interval_ms=30000  # 30s processing timeout
)

for batch in consumer:
    # Process this small batch
    process_records(batch)

    # Backpressure: if processing is slow,
    # Kafka automatically slows poll interval.
    # Consumer commits after each batch.
    consumer.commit()

    # Explicit backpressure โ€” let queue drain
    if queue_depth() > HIGH_WATERMARK:
        time.sleep(0.1)
๐Ÿ“Š

Kafka Consumer Lag as a Signal

Consumer lag = how far behind the consumer is from the latest message. High lag means backpressure is occurring โ€” the consumer cannot keep up. Alert on lag > 10,000 messages. Scale out consumer instances to reduce lag. Never let lag grow unboundedly โ€” it indicates a systemic throughput problem.

Saga Pattern โ€” Distributed Transactions

โš ๏ธ

Why Not 2PC (Two-Phase Commit)?

2PC blocks resources for the entire transaction duration. At 100ms per step ร— 5 microservices = 500ms lock time. Under load: deadlocks, coordinator failures, and cascading timeouts. 2PC is an anti-pattern for microservices at scale.

Saga breaks a distributed transaction into steps, each with a compensating action that can be run if a later step fails.

Order Saga Simulation

Click "Run Saga" to see a happy path, or "Inject Failure at Step 3" to see compensations run in reverse order.

1
Reserve Inventory
Compensate: Release inventory reservation
2
Charge Payment
Compensate: Issue full refund to customer
3
Create Shipment
Compensate: Cancel shipment with carrier
4
Update Order Status โ†’ CONFIRMED
Compensate: Mark order FAILED
Click a button to run the saga...
ApproachConsistencyAvailabilitySuitable For
Saga (choreography)EventualHigh โ€” no coordinatorMicroservices with async events
Saga (orchestration)EventualHigh โ€” but orchestrator is SPOFComplex multi-step workflows
2PCStrongLow โ€” coordinator blocksSingle-DB distributed writes only
No distributed txNoneHighestIdempotent operations only

Outbox Pattern โ€” Reliable Event Publishing

The problem: you need to update a database AND publish a message to Kafka atomically. These are two different systems โ€” a single atomic operation is impossible without a coordinator.

๐Ÿ’ฅ

The Dual-Write Problem

Write to DB succeeds, Kafka publish fails โ†’ order is marked paid but no email is sent. Or Kafka publish succeeds, DB write fails โ†’ email sent but order not updated. Either way: data inconsistency. Dual writes without a transaction guarantee are unreliable.

โœ“

Solution: Outbox Table in the Same Database

Write both the business record AND an outbox event in a single DB transaction. A separate process (the "outbox poller" or a CDC tool like Debezium) reads the outbox table and publishes to Kafka. The DB transaction is the atomic unit โ€” no cross-system coordination needed.

outbox_pattern.sql
-- Atomic: update order + outbox event in ONE transaction
BEGIN;
  UPDATE orders SET status='PAID' WHERE id=$1;
  INSERT INTO outbox (event_type, payload, status)
  VALUES ('ORDER_PAID', '{"order_id": 123}', 'PENDING');
COMMIT;

-- Separate process: read outbox and publish to Kafka
SELECT * FROM outbox WHERE status='PENDING' LIMIT 100
FOR UPDATE SKIP LOCKED;  -- Multiple pollers work safely

-- For each row: publish to Kafka, then mark published
UPDATE outbox SET status='PUBLISHED', published_at=now()
WHERE id=$1;

When to Use Each Pattern

PatternUse WhenDon't Use When
CRDTMulti-master counters, collaborative editing, offline-firstStrong consistency required (financial records)
Load SheddingProtecting against overload with priority tiersAll requests are equal priority (no P0/P2 distinction)
SagaDistributed transactions across microservicesSingle-DB operations (use a standard transaction)
OutboxReliable event publishing from DB writesVery high-volume events (>100K/sec โ€” use CDC instead)
BulkheadIsolating failure domains, multiple client typesSingle-service, single client type (adds complexity)
BackpressureFast producer / slow consumer, streaming pipelinesSynchronous request-response with known SLAs
Circuit BreakerCascading failure prevention across service callsIdempotent retries with exponential backoff suffice
SidecarCross-cutting concerns (observability, mTLS, proxy)Simple monolith โ€” adds infrastructure overhead

Quiz โ€” 5 Questions

1. A G-Counter CRDT on 3 nodes has vectors: A=[3,0,0], B=[0,2,0], C=[0,0,5]. After merging all nodes, the global count is:
2. Load shedding drops requests when system is near capacity. The correct priority to drop first is:
3. The Saga pattern handles a failure in step 3 of a 5-step distributed transaction by:
4. The Outbox pattern solves which problem?
5. Backpressure in Kafka consumer groups means: