Day 29 — Advanced Architectural Patterns

Key Concepts

Eight Patterns That Appear in Every Senior Interview

These patterns solve real distributed systems problems. Each one represents a specific class of failure or complexity that naive architectures cannot handle.

🧮

CRDTs

Conflict-free Replicated Data Types — data structures that merge automatically without conflicts. No coordination or consensus needed. Used by Figma (collaborative editing), Riak, Redis.

🪣

Load Shedding

Deliberately dropping low-priority work to protect system under overload. Prioritize P0 (payments, auth) over P2 (analytics). A degraded experience beats total unavailability.

🔃

Backpressure

Producer slows down when consumer can't keep up — prevents unbounded queues. Kafka pull model is natural backpressure: consumers control their own poll rate and lag is visible.

🚢

Bulkhead

Isolate failures — like ship bulkhead compartments that prevent one leak from sinking the vessel. Separate thread pools per service type so a slow vendor doesn't stall everything else.

Interactive Simulation 1

G-Counter CRDT — Distributed Counter

Increment counters on individual nodes independently. Merge shows how CRDT resolves distributed increments: merge = max(each position). The global total is always correct after merge.

Interactive Simulation 2

Load Shedding — Priority-Based Traffic Control

Load Shedder — 1000 req/sec Capacity

Incoming traffic (req/sec) 600 req/s

100Threshold: 800Capacity: 10003000

P0 — Payments & Auth (never shed)—

P1 — Core features (shed if critical overload)—

P2 — Analytics & Recommendations (first to shed)—

Pattern 3

Backpressure Patterns

Backpressure prevents fast producers from overwhelming slow consumers. Without it, queues grow unboundedly until the system runs out of memory.

Pattern	How It Works	When to Use
Bounded queues	Reject or block when queue is full (back-pressure propagates upstream)	Fast producers, slow consumers in same system
Rate limiting producer	Signal producer to slow down via 429 or flow control	Streaming/async systems with explicit producer control
Pull model (Kafka)	Consumer pulls at its own pace — lag is visible metric	Event streaming, durable log processing
Circuit breaker	Stop sending to overwhelmed service temporarily	Synchronous RPC calls between services

kafka_backpressure.py

# Kafka consumer implementing backpressure
consumer = KafkaConsumer(
    'events',
    max_poll_records=10,        # Process 10 at a time
    max_poll_interval_ms=30000  # 30s processing timeout
)

for batch in consumer:
    # Process this small batch
    process_records(batch)

    # Backpressure: if processing is slow,
    # Kafka automatically slows poll interval.
    # Consumer commits after each batch.
    consumer.commit()

    # Explicit backpressure — let queue drain
    if queue_depth() > HIGH_WATERMARK:
        time.sleep(0.1)

📊

Kafka Consumer Lag as a Signal

Consumer lag = how far behind the consumer is from the latest message. High lag means backpressure is occurring — the consumer cannot keep up. Alert on lag > 10,000 messages. Scale out consumer instances to reduce lag. Never let lag grow unboundedly — it indicates a systemic throughput problem.

Pattern 4

Saga Pattern — Distributed Transactions

⚠️

Why Not 2PC (Two-Phase Commit)?

2PC blocks resources for the entire transaction duration. At 100ms per step × 5 microservices = 500ms lock time. Under load: deadlocks, coordinator failures, and cascading timeouts. 2PC is an anti-pattern for microservices at scale.

Saga breaks a distributed transaction into steps, each with a compensating action that can be run if a later step fails.

Order Saga Simulation

1

Reserve Inventory

Compensate: Release inventory reservation

2

Charge Payment

Compensate: Issue full refund to customer

3

Create Shipment

Compensate: Cancel shipment with carrier

4

Update Order Status → CONFIRMED

Compensate: Mark order FAILED

Click a button to run the saga...

Approach	Consistency	Availability	Suitable For
Saga (choreography)	Eventual	High — no coordinator	Microservices with async events
Saga (orchestration)	Eventual	High — but orchestrator is SPOF	Complex multi-step workflows
2PC	Strong	Low — coordinator blocks	Single-DB distributed writes only
No distributed tx	None	Highest	Idempotent operations only

Pattern 5

Outbox Pattern — Reliable Event Publishing

The problem: you need to update a database AND publish a message to Kafka atomically. These are two different systems — a single atomic operation is impossible without a coordinator.

💥

The Dual-Write Problem

Write to DB succeeds, Kafka publish fails → order is marked paid but no email is sent. Or Kafka publish succeeds, DB write fails → email sent but order not updated. Either way: data inconsistency. Dual writes without a transaction guarantee are unreliable.

✓

Solution: Outbox Table in the Same Database

Write both the business record AND an outbox event in a single DB transaction. A separate process (the "outbox poller" or a CDC tool like Debezium) reads the outbox table and publishes to Kafka. The DB transaction is the atomic unit — no cross-system coordination needed.

outbox_pattern.sql

-- Atomic: update order + outbox event in ONE transaction
BEGIN;
  UPDATE orders SET status='PAID' WHERE id=$1;
  INSERT INTO outbox (event_type, payload, status)
  VALUES ('ORDER_PAID', '{"order_id": 123}', 'PENDING');
COMMIT;

-- Separate process: read outbox and publish to Kafka
SELECT * FROM outbox WHERE status='PENDING' LIMIT 100
FOR UPDATE SKIP LOCKED;  -- Multiple pollers work safely

-- For each row: publish to Kafka, then mark published
UPDATE outbox SET status='PUBLISHED', published_at=now()
WHERE id=$1;

Decision Guide

When to Use Each Pattern

Pattern	Use When	Don't Use When
CRDT	Multi-master counters, collaborative editing, offline-first	Strong consistency required (financial records)
Load Shedding	Protecting against overload with priority tiers	All requests are equal priority (no P0/P2 distinction)
Saga	Distributed transactions across microservices	Single-DB operations (use a standard transaction)
Outbox	Reliable event publishing from DB writes	Very high-volume events (>100K/sec — use CDC instead)
Bulkhead	Isolating failure domains, multiple client types	Single-service, single client type (adds complexity)
Backpressure	Fast producer / slow consumer, streaming pipelines	Synchronous request-response with known SLAs
Circuit Breaker	Cascading failure prevention across service calls	Idempotent retries with exponential backoff suffice
Sidecar	Cross-cutting concerns (observability, mTLS, proxy)	Simple monolith — adds infrastructure overhead

Knowledge Check

Quiz — 5 Questions

1. A G-Counter CRDT on 3 nodes has vectors: A=[3,0,0], B=[0,2,0], C=[0,0,5]. After merging all nodes, the global count is:

2. Load shedding drops requests when system is near capacity. The correct priority to drop first is:

3. The Saga pattern handles a failure in step 3 of a 5-step distributed transaction by:

4. The Outbox pattern solves which problem?

5. Backpressure in Kafka consumer groups means: