Rate Limiting & Traffic Shaping
Token bucket, leaky bucket, sliding window, distributed rate limiting, and abuse prevention.
4 Exercises
12 Concept Checks
~95 min total
System Design
Session Progress
0 / 4 completed
Rate Limiting Algorithm Selection
A public API serves 3 client types: (A) Background jobs that burst 500 requests at startup then slow to 10 req/sec, (B) Interactive users expecting smooth consistent responses, (C) Potential abusers scraping data as fast as possible. Each type needs a different algorithm.
Algorithm Comparison
Token Bucket: accumulates tokens → allows bursts
|
Leaky Bucket: constant output rate, drops bursts
|
Sliding Window: precise count, no boundary attacks
Concept Check — 3 questions
Q1. For background jobs that burst 500 requests at startup, then slow to 10 req/sec, which algorithm is best?
ALeaky bucket — smooths all output to a constant rate regardless of input bursts
BToken bucket — accumulates tokens over time, allows bursting through the reserve
CFixed window counter — simple to implement for bursty workloads
DNo rate limiting — background jobs should never be limited
Q2. The leaky bucket algorithm guarantees what output characteristic?
AConstant output rate regardless of input burst size
BAllows all bursts to pass through with no rate restriction
CUnlimited throughput with no waiting
DZero latency on all requests including burst traffic
Q3. Sliding window counter is more accurate than fixed window counter because?
AIt processes requests faster due to optimized data structures
BIt uses less memory than fixed windows at high request rates
CIt prevents boundary attacks — no synchronized reset at window edges allows 2× the limit
DIt is simpler to implement and debug
Boundary attack on fixed windows: send 100 requests at 11:59:59 (end of window 1), then 100 at 12:00:01 (start of window 2) = 200 requests in 2 seconds despite a 100 req/min limit. Token bucket Redis implementation: store {tokens, last_refill_timestamp} in Redis, atomically refill and decrement on each request. Tiers: store tier in user metadata → lookup limit → apply algorithm.
Open Design Challenge
Explain the boundary attack against fixed windows: if the limit is 100 req/min, how can an attacker get 200 requests in 2 seconds? Show the timing.
Implement a token bucket rate limiter using Redis. What data structure do you store per user? Show the pseudocode with INCR and EXPIRE commands.
Design rate limit tiers: free (100 req/min), pro (1,000 req/min), enterprise (unlimited). How do you store the tier and check limits efficiently on every request?
Concept score: 0/3
Distributed Rate Limiting
A company has 10 API servers. Rate limit per user: 1,000 req/min. Each server maintains its own local counter. User A sends 100 requests to each of 10 servers = 1,000 total requests, but each server only sees 100 — all pass the per-server limit of 100. The rate limit is bypassed entirely.
Local vs Centralized Rate Limiting
❌ Local counters (broken)
User → LB → 10 servers each see 100 req → all allow → total 1,000 req bypass limit
✅ Central Redis counter
All 10 servers INCR same Redis key → shared count → 429 when count exceeds limit
Concept Check — 3 questions
Q1. The fix for distributed rate limiting across multiple API servers is?
AIncrease per-server limit to 1,000 so the total across servers equals 10,000
BCentralized Redis counter — all servers check and increment the same Redis key
CUse sticky sessions so one user always goes to the same server
DRoute by IP hash so each user is rate limited by one server
Q2. Redis INCR for rate limiting returns the new count. If count exceeds the limit, what HTTP status code do you return?
A200 OK — silently drop the request without telling the client
B403 Forbidden — the user is unauthorized to make this request
C429 Too Many Requests with Retry-After header indicating when to retry
D503 Service Unavailable — the service is down due to overload
Q3. Redis rate limiting adds ~1ms per request. To reduce this latency overhead, what technique helps?
ADisable rate limiting entirely — 1ms overhead is too high a cost
BRedis pipeline — batch the INCR + EXPIRE commands in one round-trip
CSwitch from Redis to PostgreSQL for rate limit storage
DCache the rate limit decision for 60 seconds to avoid Redis calls
Redis sliding window: ZADD ratelimit:{user_id} {timestamp} {request_id}, ZREMRANGEBYSCORE (remove old entries outside the window), ZCARD (count current requests). Fail open vs closed: financial APIs fail closed (block all on Redis outage); social/content APIs usually fail open (allow all). Priority order: API key > user ID > IP address — API key is the strongest identity signal.
Open Design Challenge
Write the Redis command sequence for a sliding window rate limiter: ZADD, ZREMRANGEBYSCORE, ZCARD, and EXPIRE. Show the flow for checking if a request is allowed.
What happens if the Redis rate limiter goes down? Should you fail open (allow all) or fail closed (block all)? Justify based on API type.
Design rate limiting by API key, user ID, and IP. Which takes priority and why? How do you handle the case where only an IP is known (unauthenticated request)?
Concept score: 0/3
Rate Limit Headers and Client Communication
Your API returns rate limit info in headers but clients keep hitting 429s. SDK developers implemented exponential backoff incorrectly — they retry only the failed request while continuing to send other requests at full rate, wasting the backoff window entirely.
Self-Throttling with Rate Limit Headers
API response
→
RateLimit-Remaining: 20
→
Client reads headers
→
Self-throttles proactively before hitting 429
Concept Check — 3 questions
Q1. Standard rate limit response headers from the RateLimit header standard include?
AX-API-Key and X-API-Secret
BRateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset
CX-Request-ID and Content-Type
DAuthorization and Cache-Control
Q2. Exponential backoff with jitter means?
ARetry immediately after failure with no waiting period
BWait exactly 2^n seconds between retries with no randomness
CWait 2^n seconds + random jitter to prevent synchronized retries from multiple clients
DStop retrying entirely after the first 429 response
Q3. A client receives a 429 response with "Retry-After: 30". The correct client behavior is?
ARetry only this specific failed request after 30s while continuing to send other requests at full rate
BPause ALL requests for 30 seconds, then resume sending
CSwitch to a different API key to bypass the rate limit
DIgnore the header and retry the request immediately
Proactive throttling: when RateLimit-Remaining drops below 20% of RateLimit-Limit, insert a proportional delay between requests — slow down before hitting the wall. Adaptive rate: track 429 rate; if more than 1% of requests return 429 in the last 60 seconds, halve the sending rate. Test in CI/CD: use a mock server configured to return 429 on every nth request to validate backoff behavior without hitting real limits.
Open Design Challenge
Design a client-side rate limiter SDK that reads RateLimit-Remaining and proactively slows down when below 20% remaining. How does it calculate the delay?
Implement adaptive backoff: if server returns 429, reduce request rate by 50%; if no 429 for 60s, increase by 10%. Show the state machine.
How do you test rate limiting behavior in CI/CD without hitting a real rate limiter? Design the mock server and test scenarios.
Concept score: 0/3
Designing Against DDoS and Scraping Abuse
A competitor scrapes your product catalog using 10,000 rotating IPs. Standard IP rate limiting fails — each IP makes only 1-2 requests. The scraper generates 500,000 requests/hour vs 50,000 from legitimate users. Your origin servers are overwhelmed and latency spikes for real customers.
IP Rotation Bypasses Simple Rate Limiting
10K rotating IPs
→ each 1-2 req
appears legitimate per-IP
→ total
500K req/hour overwhelms origin
→ need
behavioral fingerprinting
Concept Check — 3 questions
Q1. IP rotation defeats per-IP rate limiting. A better signal for detecting scraping is?
ABlock all IPs from specific countries associated with scraping
BBehavioral fingerprinting: request timing patterns, header analysis, JS execution, sequential IDs
CCAPTCHA on every single API request to filter bots
DReduce global API rate limit to 1 req/sec to stop all scrapers
Q2. A request arrives with Accept: text/html but User-Agent: Python-Requests/2.28. This request is likely?
AA legitimate mobile app — mobile apps sometimes send non-standard headers
BA broken browser — it must have a corrupted User-Agent string
CA scraper — legitimate browsers never use Python-Requests as the User-Agent
DA search engine crawler — these use non-browser User-Agent strings
Q3. Bot mitigation at the CDN/WAF layer (before hitting origin) is better than at the application layer because?
ACDN infrastructure is cheaper to operate than application servers
BRequests blocked at CDN/WAF never reach origin — saving all downstream compute, memory, and DB resources
CWAF rules are faster to write than application middleware
DApplication layer cannot perform rate limiting by IP address
Shadow ban: detect scraper, put in "shadow" tier — return HTTP 200 with empty arrays or fake/delayed data instead of 429. The scraper thinks it's working; you've reduced data exfiltration without tipping them off. JS challenge: Cloudflare and similar services inject a JS puzzle — real browsers solve it in milliseconds; headless browsers fail or take too long. Honeypot: add a product ID (e.g., /api/products/000000) that never appears in the UI but scrapers enumerate sequentially — any IP hitting it gets flagged.
Open Design Challenge
Design a "shadow ban" system for scrapers: allow them to continue requesting but return fake or empty data. How do you detect them and what do you return?
How does Cloudflare's "Browser Integrity Check" detect headless browsers? What signals distinguish real browsers from automated tools?
Design a honeypot endpoint that real users never hit but scrapers enumerate. What URL pattern and what action do you take when a client hits it?
Concept score: 0/3