Eight production-grade patterns: CRDT for conflict-free replication, load shedding under overload, backpressure in streaming systems, Circuit Breakers, Bulkhead, Saga, Outbox Pattern, and Sidecar.
These patterns solve real distributed systems problems. Each one represents a specific class of failure or complexity that naive architectures cannot handle.
Conflict-free Replicated Data Types โ data structures that merge automatically without conflicts. No coordination or consensus needed. Used by Figma (collaborative editing), Riak, Redis.
Deliberately dropping low-priority work to protect system under overload. Prioritize P0 (payments, auth) over P2 (analytics). A degraded experience beats total unavailability.
Producer slows down when consumer can't keep up โ prevents unbounded queues. Kafka pull model is natural backpressure: consumers control their own poll rate and lag is visible.
Isolate failures โ like ship bulkhead compartments that prevent one leak from sinking the vessel. Separate thread pools per service type so a slow vendor doesn't stall everything else.
Increment counters on individual nodes independently. Merge shows how CRDT resolves distributed increments: merge = max(each position). The global total is always correct after merge.
Backpressure prevents fast producers from overwhelming slow consumers. Without it, queues grow unboundedly until the system runs out of memory.
| Pattern | How It Works | When to Use |
|---|---|---|
| Bounded queues | Reject or block when queue is full (back-pressure propagates upstream) | Fast producers, slow consumers in same system |
| Rate limiting producer | Signal producer to slow down via 429 or flow control | Streaming/async systems with explicit producer control |
| Pull model (Kafka) | Consumer pulls at its own pace โ lag is visible metric | Event streaming, durable log processing |
| Circuit breaker | Stop sending to overwhelmed service temporarily | Synchronous RPC calls between services |
# Kafka consumer implementing backpressure consumer = KafkaConsumer( 'events', max_poll_records=10, # Process 10 at a time max_poll_interval_ms=30000 # 30s processing timeout ) for batch in consumer: # Process this small batch process_records(batch) # Backpressure: if processing is slow, # Kafka automatically slows poll interval. # Consumer commits after each batch. consumer.commit() # Explicit backpressure โ let queue drain if queue_depth() > HIGH_WATERMARK: time.sleep(0.1)
Consumer lag = how far behind the consumer is from the latest message. High lag means backpressure is occurring โ the consumer cannot keep up. Alert on lag > 10,000 messages. Scale out consumer instances to reduce lag. Never let lag grow unboundedly โ it indicates a systemic throughput problem.
2PC blocks resources for the entire transaction duration. At 100ms per step ร 5 microservices = 500ms lock time. Under load: deadlocks, coordinator failures, and cascading timeouts. 2PC is an anti-pattern for microservices at scale.
Saga breaks a distributed transaction into steps, each with a compensating action that can be run if a later step fails.
| Approach | Consistency | Availability | Suitable For |
|---|---|---|---|
| Saga (choreography) | Eventual | High โ no coordinator | Microservices with async events |
| Saga (orchestration) | Eventual | High โ but orchestrator is SPOF | Complex multi-step workflows |
| 2PC | Strong | Low โ coordinator blocks | Single-DB distributed writes only |
| No distributed tx | None | Highest | Idempotent operations only |
The problem: you need to update a database AND publish a message to Kafka atomically. These are two different systems โ a single atomic operation is impossible without a coordinator.
Write to DB succeeds, Kafka publish fails โ order is marked paid but no email is sent. Or Kafka publish succeeds, DB write fails โ email sent but order not updated. Either way: data inconsistency. Dual writes without a transaction guarantee are unreliable.
Write both the business record AND an outbox event in a single DB transaction. A separate process (the "outbox poller" or a CDC tool like Debezium) reads the outbox table and publishes to Kafka. The DB transaction is the atomic unit โ no cross-system coordination needed.
-- Atomic: update order + outbox event in ONE transaction BEGIN; UPDATE orders SET status='PAID' WHERE id=$1; INSERT INTO outbox (event_type, payload, status) VALUES ('ORDER_PAID', '{"order_id": 123}', 'PENDING'); COMMIT; -- Separate process: read outbox and publish to Kafka SELECT * FROM outbox WHERE status='PENDING' LIMIT 100 FOR UPDATE SKIP LOCKED; -- Multiple pollers work safely -- For each row: publish to Kafka, then mark published UPDATE outbox SET status='PUBLISHED', published_at=now() WHERE id=$1;
| Pattern | Use When | Don't Use When |
|---|---|---|
| CRDT | Multi-master counters, collaborative editing, offline-first | Strong consistency required (financial records) |
| Load Shedding | Protecting against overload with priority tiers | All requests are equal priority (no P0/P2 distinction) |
| Saga | Distributed transactions across microservices | Single-DB operations (use a standard transaction) |
| Outbox | Reliable event publishing from DB writes | Very high-volume events (>100K/sec โ use CDC instead) |
| Bulkhead | Isolating failure domains, multiple client types | Single-service, single client type (adds complexity) |
| Backpressure | Fast producer / slow consumer, streaming pipelines | Synchronous request-response with known SLAs |
| Circuit Breaker | Cascading failure prevention across service calls | Idempotent retries with exponential backoff suffice |
| Sidecar | Cross-cutting concerns (observability, mTLS, proxy) | Simple monolith โ adds infrastructure overhead |