Day 29 Exercises

Advanced System Design Patterns

Apply CRDTs, rate limiting, load shedding, and backpressure to real distributed systems challenges.

Exercise 1🟡 Easy15 min
Rate Limiter Design
An API gateway must limit each user to 100 requests per minute. The system has 10 gateway servers. Design a rate limiter that works correctly across all 10 servers.

Tasks

  • Why does a per-server rate limiter fail for distributed systems?
  • Design a Redis-based token bucket rate limiter. What are the Redis operations?
  • What is the sliding window log approach and how does it compare to token bucket?
  • What response should you return when the limit is exceeded?
Your Notes
Exercise 2🔴 Medium20 min
CRDT for Collaborative Editing
Two users edit a shared Google Doc simultaneously. Alice inserts "hello" at position 5. Bob deletes the character at position 3. Both operations happen while offline. Design the merge.

Tasks

  • Why do Operational Transforms (OT) require a central server to order operations?
  • How does a CRDT-based approach handle offline concurrent edits?
  • What is a "tombstone" in the context of CRDT deletion?
  • Design a simple sequence CRDT: how do you represent a character insertion so it can be merged without conflicts?
Your Notes
Exercise 3🔴 Medium25 min
Load Shedding Under Extreme Traffic
A service normally handles 10,000 RPS but receives a sudden spike to 50,000 RPS (e.g., a viral moment). Rather than crashing, it must serve 100% of critical traffic and gracefully shed lower-priority traffic.

Tasks

  • Define P0, P1, P2 traffic tiers for a social media platform.
  • Design the load shedding algorithm (what metric triggers shedding? what gets shed first?)
  • How do you implement load shedding without adding latency to accepted requests?
  • What user experience do you provide to shed requests? (Fail-open vs. fail-closed)
Your Notes
Exercise 4🔥 Hard35 min
Backpressure in a Streaming Pipeline
A log ingestion pipeline: [App Servers] → [Kafka] → [Stream Processor] → [Elasticsearch]. App servers produce 1M logs/second. Elasticsearch can only index 100K/second. Without backpressure, what happens?

Tasks

  • Without backpressure: what failure mode occurs? (Kafka lag, OOM, etc.)
  • Design a backpressure mechanism at each stage.
  • How does Kafka consumer lag act as a natural backpressure signal?
  • Design a "load shedding" fallback: when Elasticsearch is overwhelmed, which logs do you drop first?
Your Notes