Click through interactive simulations to see exactly how load balancers distribute traffic, how consistent hashing minimizes remapping, and when different algorithms break down.
A single server has a ceiling — CPU, RAM, network bandwidth. When you hit it, you need more servers. But how does a client know which server to talk to? That's what load balancers solve.
L4 (transport layer) sees IP addresses and ports — that's it. L7 (application layer) decrypts TLS and reads HTTP content. This determines what routing decisions are possible.
L4 processes packets at near line-rate (millions/sec, microseconds latency). L7 must terminate TLS, parse HTTP, read headers — roughly 10-100× more CPU per connection. AWS NLB (L4) handles 10M connections on bare metal. AWS ALB (L7) is optimized for HTTP but can't match raw NLB throughput. Real systems often chain both: NLB → ALB → service.
When you add or remove a cache server, how many keys get remapped? With naive modulo hashing: almost all of them. With consistent hashing: ~1/N. See the difference below.
Add 1 server (N→N+1): every key that resolves to server numbers ≥ insertion point remaps. In practice, (N-1)/N ≈ 90–100% of keys remap. This causes a massive cache miss storm — every key looks up a cold cache, and your DB gets 10× normal load.
Add a server: only keys between the new server's hash position and its predecessor on the ring remap. For 10 servers, adding 1 remaps ~10% of keys. Cache miss rate stays manageable — no DB stampede. Used by: Cassandra, Redis Cluster, CDNs, DynamoDB.
With 3 servers and 3 ring positions, each server might own unequal hash space (33% average but high variance). One server might own 60% of the ring by chance. Virtual nodes (vnodes) give each server 150+ positions, distributing load much more evenly. Cassandra defaults to 256 vnodes per physical node.
| Affinity Method | Mechanism | Problem | Verdict |
|---|---|---|---|
| IP Hash | Hash client IP → fixed server | Corporate NAT maps millions of users to one IP → hot server | Dangerous |
| Cookie-based | LB sets cookie with server ID | Better than IP, still breaks uniform distribution | Acceptable |
| External Session Store | Redis/memcached stores session; all servers are stateless | Extra hop for session lookup (usually <1ms) | Recommended |
If you need sticky sessions, it's usually a sign that your app stores state on the server (local filesystem, in-memory session, etc.). The fix isn't better affinity — it's making your app stateless. Store sessions in Redis. Store uploads in S3. Make any server able to handle any request. This is what enables auto-scaling.
Shallow (TCP connect): Only checks if the process is listening. Misses: app that accepted the TCP connection but can't query its DB, zombie processes that accept connections but time out.
Deep (HTTP /health): App queries all dependencies and returns 200 only if truly healthy. This is what you want in production.
When you deregister a server, the LB marks it "draining" — stops sending new requests but lets in-flight requests complete (up to drain timeout, usually 300s). Never kill a server without draining: you'll drop active requests mid-flight. This is the mechanism that makes zero-downtime deploys possible.
Database connection pool: can you acquire a connection and run SELECT 1? Redis/cache: can you ping? External APIs (with timeout): are your critical dependencies up? Return 200 with JSON of dependency statuses. Return 503 if any critical dependency is down. Never return 200 when your app can't actually serve requests — that misleads the LB and causes user-facing errors.