Day 29 Exercises
Advanced System Design Patterns
Apply CRDTs, rate limiting, load shedding, and backpressure to real distributed systems challenges.
Rate Limiter Design
An API gateway must limit each user to 100 requests per minute. The system has 10 gateway servers. Design a rate limiter that works correctly across all 10 servers.
Tasks
- Why does a per-server rate limiter fail for distributed systems?
- Design a Redis-based token bucket rate limiter. What are the Redis operations?
- What is the sliding window log approach and how does it compare to token bucket?
- What response should you return when the limit is exceeded?
CRDT for Collaborative Editing
Two users edit a shared Google Doc simultaneously. Alice inserts "hello" at position 5. Bob deletes the character at position 3. Both operations happen while offline. Design the merge.
Tasks
- Why do Operational Transforms (OT) require a central server to order operations?
- How does a CRDT-based approach handle offline concurrent edits?
- What is a "tombstone" in the context of CRDT deletion?
- Design a simple sequence CRDT: how do you represent a character insertion so it can be merged without conflicts?
Load Shedding Under Extreme Traffic
A service normally handles 10,000 RPS but receives a sudden spike to 50,000 RPS (e.g., a viral moment). Rather than crashing, it must serve 100% of critical traffic and gracefully shed lower-priority traffic.
Tasks
- Define P0, P1, P2 traffic tiers for a social media platform.
- Design the load shedding algorithm (what metric triggers shedding? what gets shed first?)
- How do you implement load shedding without adding latency to accepted requests?
- What user experience do you provide to shed requests? (Fail-open vs. fail-closed)
Backpressure in a Streaming Pipeline
A log ingestion pipeline: [App Servers] → [Kafka] → [Stream Processor] → [Elasticsearch]. App servers produce 1M logs/second. Elasticsearch can only index 100K/second. Without backpressure, what happens?
Tasks
- Without backpressure: what failure mode occurs? (Kafka lag, OOM, etc.)
- Design a backpressure mechanism at each stage.
- How does Kafka consumer lag act as a natural backpressure signal?
- Design a "load shedding" fallback: when Elasticsearch is overwhelmed, which logs do you drop first?