Watch tokens refill in real time. See how fixed-window lets double-rate bursts slip through. Compare all four algorithms side by side. Implement distributed rate limiting with Redis — correctly.
All four algorithms limit to "100 requests per minute" — but they behave differently when a user sends 100 requests at once, then waits. Token bucket allows it. Leaky bucket queues them. Fixed window has a boundary vulnerability. Sliding window is smooth but expensive. See them live.
Fixed window resets its counter at a hard boundary (e.g., every minute on the clock). An attacker who knows the boundary can send 100 requests at T=59s and 100 more at T=61s — 200 requests in 2 seconds, both windows are clean. The sliding window solves this with a rolling time calculation.
The choice of algorithm affects user experience, memory usage, and attack surface. A leaky bucket is deterministic but annoying. Token bucket is user-friendly but complex. The right choice depends on your use case.
| Algorithm | Burst Behavior | Memory | Smoothing | Best For | Caveat |
|---|---|---|---|---|---|
| Token Bucket | Allows burst up to capacity | O(1) | No | API rate limits, user quotas | Burst can still overload downstream |
| Leaky Bucket | Queued — smoothed out | O(queue) | Yes | Network QoS, outbound HTTP rate | Adds latency, queue can fill and drop |
| Fixed Window | Boundary attack: 2× burst | O(1) | No | Simple use cases only | Vulnerable at window boundary |
| Sliding Window | Strictly enforced | O(requests) or O(1) log approx | Yes | Strict API limits, security | Exact: high memory; Approx: slight inaccuracy |
Requests enter a FIFO queue (the "bucket"). A constant-rate processor drains the queue (water "leaks" out at a fixed rate). If the queue is full, new requests are dropped. This guarantees a smooth output rate regardless of input bursts. Used in: network packet shaping, Nginx's limit_req module.
Store a timestamp for every request. On each request, count timestamps in the last 60s. If count < limit: allow and add timestamp. If ≥ limit: reject. Accurate but O(requests) memory — for 1M users at 1000 req/min that's 1B timestamps. In practice: use sliding window counter (approximation) instead.
A simple INCR counter; EXPIRE counter 60 has a race condition: two requests can both check the counter before either increments it, both seeing it as below the limit. Lua scripts are atomic in Redis — they execute as a single unit, eliminating the race.
If each app server keeps a local counter, a user can send 100 requests to 10 different servers and never hit any single server's limit. You need a shared counter — which is why Redis is the standard. But Redis adds 1-2ms latency per request. That's acceptable for most APIs but not for sub-millisecond paths.
The exact sliding window needs one entry per request (high memory). The approximation: keep two fixed-window counters (current window + previous window) and estimate: count = prev_count × (1 - elapsed/window) + curr_count. This gives <0.1% error vs exact sliding window at O(1) memory. Cloudflare uses this for their 1M req/s rate limiter.
X-RateLimit-Limit: 100 — max requests per window
X-RateLimit-Remaining: 47 — requests left in current window
X-RateLimit-Reset: 1709500800 — Unix timestamp when window resets
Retry-After: 30 — seconds until retry is safe (on 429 response)
Always return these on 429. Clients that don't get Retry-After often hammer you immediately, making the problem worse.