Week 1 · Day 5

Rate Limiting &
Traffic Shaping

Watch tokens refill in real time. See how fixed-window lets double-rate bursts slip through. Compare all four algorithms side by side. Implement distributed rate limiting with Redis — correctly.

4
Algorithms
~3h
Study Time
3
Quizzes
Token Bucket Leaky Bucket Fixed Window Sliding Window Redis Lua Distributed RL
01 — The Four Algorithms

Same limit, different behavior under burst

All four algorithms limit to "100 requests per minute" — but they behave differently when a user sends 100 requests at once, then waits. Token bucket allows it. Leaky bucket queues them. Fixed window has a boundary vulnerability. Sliding window is smooth but expensive. See them live.

🪣 Rate Limiting Algorithm Simulator
Select an algorithm. Click "Send Request" or "Send Burst" to fire requests. Watch the internal state update — tokens, queue depth, window counters.
Select an algorithm and start sending requests.
💡

Token Bucket

Tokens refill at a fixed rate (e.g., 10/s). Requests consume one token. If tokens available: allow. If empty: reject (or queue). Allows controlled bursting — if the bucket is full and a burst arrives, all tokens can be consumed at once. Most APIs (Stripe, GitHub) use token bucket because it allows temporary bursts.

02 — Fixed vs Sliding Window

The boundary attack — why fixed window is broken

Fixed window resets its counter at a hard boundary (e.g., every minute on the clock). An attacker who knows the boundary can send 100 requests at T=59s and 100 more at T=61s — 200 requests in 2 seconds, both windows are clean. The sliding window solves this with a rolling time calculation.

📊 Window Vulnerability Demo
The timeline shows requests over 2 minutes. The red zone is the boundary attack window. Click "Boundary Attack" to see 2× the limit slip through the fixed window.

FIXED WINDOW (vulnerable)

Waiting…

SLIDING WINDOW (safe)

Waiting…
03 — Algorithm Deep Dive

When to use each algorithm

The choice of algorithm affects user experience, memory usage, and attack surface. A leaky bucket is deterministic but annoying. Token bucket is user-friendly but complex. The right choice depends on your use case.

AlgorithmBurst BehaviorMemorySmoothingBest ForCaveat
Token BucketAllows burst up to capacityO(1)NoAPI rate limits, user quotasBurst can still overload downstream
Leaky BucketQueued — smoothed outO(queue)YesNetwork QoS, outbound HTTP rateAdds latency, queue can fill and drop
Fixed WindowBoundary attack: 2× burstO(1)NoSimple use cases onlyVulnerable at window boundary
Sliding WindowStrictly enforcedO(requests) or O(1) log approxYesStrict API limits, securityExact: high memory; Approx: slight inaccuracy
🪣

Leaky Bucket — the detail

Requests enter a FIFO queue (the "bucket"). A constant-rate processor drains the queue (water "leaks" out at a fixed rate). If the queue is full, new requests are dropped. This guarantees a smooth output rate regardless of input bursts. Used in: network packet shaping, Nginx's limit_req module.

🪟

Sliding Window Log

Store a timestamp for every request. On each request, count timestamps in the last 60s. If count < limit: allow and add timestamp. If ≥ limit: reject. Accurate but O(requests) memory — for 1M users at 1000 req/min that's 1B timestamps. In practice: use sliding window counter (approximation) instead.

04 — Redis Rate Limiting Implementation

Why naive Redis INCR breaks — and how Lua fixes it

A simple INCR counter; EXPIRE counter 60 has a race condition: two requests can both check the counter before either increments it, both seeing it as below the limit. Lua scripts are atomic in Redis — they execute as a single unit, eliminating the race.

-- Redis Lua: Sliding Window Counter (atomic, race-free) -- KEYS[1] = rate limit key (e.g., "rl:user:42") -- ARGV[1] = current timestamp (ms), ARGV[2] = window (ms), ARGV[3] = limit local key = KEYS[1] local now = tonumber(ARGV[1]) local window = tonumber(ARGV[2]) local limit = tonumber(ARGV[3]) local window_start = now - window -- Remove entries older than the window redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start) -- Count requests in window local count = redis.call('ZCARD', key) if count < limit then -- Allow: add this request's timestamp with score=timestamp redis.call('ZADD', key, now, now .. '-' .. math.random(1, 999999)) -- Set TTL so key expires automatically redis.call('EXPIRE', key, math.ceil(window / 1000)) return {1, limit - count - 1} -- {allowed, remaining} else return {0, 0} -- {rejected, remaining} end
# Python: Calling the Lua script from FastAPI import redis.asyncio as redis import time RATE_LIMIT_SCRIPT = """... (Lua script above) ...""" class RateLimiter: def __init__(self, redis_client, limit: int, window_ms: int): self.redis = redis_client self.limit = limit self.window_ms = window_ms self.script = self.redis.register_script(RATE_LIMIT_SCRIPT) async def is_allowed(self, user_id: str) -> tuple[bool, int]: now_ms = int(time.time() * 1000) result = await self.script( keys=[f"rl:{user_id}"], args=[now_ms, self.window_ms, self.limit] ) allowed, remaining = bool(result[0]), result[1] return allowed, remaining # FastAPI middleware async def rate_limit_middleware(request: Request, call_next): user_id = request.headers.get("X-User-ID", request.client.host) allowed, remaining = await limiter.is_allowed(user_id) if not allowed: return JSONResponse( {"error": "rate_limit_exceeded"}, status_code=429, headers={"Retry-After": "60", "X-RateLimit-Remaining": "0"} ) response = await call_next(request) response.headers["X-RateLimit-Remaining"] = str(remaining) return response
05 — Distributed Rate Limiting

Rate limiting across 10 app servers

If each app server keeps a local counter, a user can send 100 requests to 10 different servers and never hit any single server's limit. You need a shared counter — which is why Redis is the standard. But Redis adds 1-2ms latency per request. That's acceptable for most APIs but not for sub-millisecond paths.

🌐 Distributed Rate Limiting Simulator
Three app servers, one shared Redis. Click "Local Only" to see how distributed attacks bypass per-server limits. Click "Redis Shared" to see correct global enforcement.
Mode: Redis Shared

App Server 1

0
requests

App Server 2

0
requests

App Server 3

0
requests
Global Counter (Redis)
0
of 10 limit
Status
ALLOWED
Select a mode and fire requests to see the difference.
🏗️

Sliding Window Counter Approximation — O(1) memory, high accuracy

The exact sliding window needs one entry per request (high memory). The approximation: keep two fixed-window counters (current window + previous window) and estimate: count = prev_count × (1 - elapsed/window) + curr_count. This gives <0.1% error vs exact sliding window at O(1) memory. Cloudflare uses this for their 1M req/s rate limiter.

🏆

Rate limit headers — the standard (RFC 6585 + 9110)

X-RateLimit-Limit: 100 — max requests per window
X-RateLimit-Remaining: 47 — requests left in current window
X-RateLimit-Reset: 1709500800 — Unix timestamp when window resets
Retry-After: 30 — seconds until retry is safe (on 429 response)
Always return these on 429. Clients that don't get Retry-After often hammer you immediately, making the problem worse.

Rate Limiting Algorithm Selection

✅ Token Bucket — choose when...
You want to allow short bursts (bucket fills up to N tokens). API rate limiting for external users. Stripe, GitHub, Twitter APIs use token bucket. Implemented with Redis DECR + TTL. Most flexible and user-friendly algorithm.
📊 Sliding Window Log — choose when...
You need exact rate limiting (no edge-of-window exploits). Stores each request timestamp in Redis sorted set. ZRANGEBYSCORE to count requests in window. Accurate but uses more memory (O(requests) per user per window).
⚠ Fixed Window Counter — avoid for APIs
Simplest: INCR counter per minute. Problem: "boundary burst" — user can send 100 at 11:59 and 100 at 12:00 = 200 in 2 seconds. Use only when this is acceptable (internal tools). Redis INCR + EXPIRE.
🔃 Leaky Bucket — choose when...
You need a constant output rate (smoothing bursts). Requests enter a queue, processed at fixed rate. Good for: protecting downstream services that can't handle bursts. Network QoS, video encoding queues.
Interview Answer: "I'd implement token bucket in Redis using atomic DECR. Each user has a key with their remaining tokens; a background job refills at rate R/sec using INCRBY with a cap. Return 429 Too Many Requests with Retry-After header when tokens are exhausted."
Quiz — Check Your Understanding

Rate Limiting

Q1. Your API has a limit of 100 requests/minute per user. At 11:59:55 PM, a user sends 100 requests (all allowed). At 12:00:01 AM (6 seconds later), the same user sends 100 more requests. With a fixed window counter, what happens?
Q2. You're implementing rate limiting in Redis using INCR + EXPIRE. Two requests arrive simultaneously at T=0. Both read the counter (value=0), both decide they're below the limit, both INCR (counter becomes 2 briefly, but both see "below limit"). What's wrong and how do you fix it?
Q3. Which rate limiting algorithm guarantees a perfectly smooth output rate (no burst) regardless of input patterns?