Week 1 · Day 1

Horizontal Scaling &
Load Balancer Internals

Click through interactive simulations to see exactly how load balancers distribute traffic, how consistent hashing minimizes remapping, and when different algorithms break down.

4
Simulations
~4h
Study Time
2
Quizzes
L4 vs L7 Consistent Hashing Session Affinity Health Checks Connection Draining
01 — The Problem

What happens when one server isn't enough?

A single server has a ceiling — CPU, RAM, network bandwidth. When you hit it, you need more servers. But how does a client know which server to talk to? That's what load balancers solve.

⚡ Load Balancer Algorithm Simulator
Watch how different algorithms distribute incoming requests across 4 servers in real-time. Click "Send Request" repeatedly to see the distribution unfold.
0
Server 1
0
Server 2
0
Server 3
0
Server 4
💡

Round Robin

Requests rotate through servers in sequence: 1→2→3→4→1→2→... Each server gets exactly equal traffic assuming uniform request duration. Simple and effective for stateless services with similar request costs.

02 — OSI Layers in Practice

What does "L4 vs L7" actually see?

L4 (transport layer) sees IP addresses and ports — that's it. L7 (application layer) decrypts TLS and reads HTTP content. This determines what routing decisions are possible.

🔌 L4 Load Balancer Sees
src_ip: 203.0.113.42
src_port: 54321
dst_ip: 10.0.0.1
dst_port: 443
TLS encrypted payload...
HTTP headers: hidden
URL path: hidden
Cookie: hidden
Can route on: IP, port, TCP flags. Cannot inspect HTTP headers, URL, cookies, or content.
🧠 L7 Load Balancer Sees
src_ip: 203.0.113.42
Host: api.example.com
GET /api/v2/users/123
Authorization: Bearer eyJ...
Cookie: session=abc123
Content-Type: application/json
X-Region: us-east-1
Can route on anything: URL path, headers, cookies, request body. Enables A/B testing, auth, canary deployments.
💡

The Trade-off

L4 processes packets at near line-rate (millions/sec, microseconds latency). L7 must terminate TLS, parse HTTP, read headers — roughly 10-100× more CPU per connection. AWS NLB (L4) handles 10M connections on bare metal. AWS ALB (L7) is optimized for HTTP but can't match raw NLB throughput. Real systems often chain both: NLB → ALB → service.

Load Balancer Selection

✅ Nginx / HAProxy (L7)
HTTP-aware routing, path-based rules (/api → service A, / → service B), SSL termination, sticky sessions, rate limiting. Self-hosted, open-source, battle-tested. Used by: Airbnb, GitHub, Netflix.
☁️ AWS ALB / CloudFront (Managed L7)
Zero ops overhead, auto-scaling, native AWS integrations (ECS, Lambda, WAF). Use when on AWS and you want managed infra. Cost: higher than self-hosted at scale.
⚡ L4 Load Balancer (TCP)
Protocol-agnostic: works for TCP, UDP, gRPC, gaming, streaming. Faster than L7 (no HTTP parsing). Use for non-HTTP services or when lowest latency matters. AWS NLB, HAProxy TCP mode.
❌ Round-Robin DNS (avoid)
DNS-level load balancing. Clients cache DNS → uneven distribution. No health checking → clients hit dead servers. No session affinity. Only use as a last resort or geographic routing (GeoDNS).
Interview Answer: "For HTTP services I'd use an L7 load balancer like Nginx or AWS ALB — it gives me path-based routing, SSL termination, and health checks. If I need protocol-agnostic load balancing or lower latency, I'd use an L4 load balancer like AWS NLB."
🎯 Quick Check: Your team needs to route requests to different microservices based on the URL path (/api/payments → payment-service, /api/orders → order-service). Which load balancer type do you use?
03 — The Key Algorithm

Consistent Hashing: Minimizing Key Remapping

When you add or remove a cache server, how many keys get remapped? With naive modulo hashing: almost all of them. With consistent hashing: ~1/N. See the difference below.

⭕ Consistent Hash Ring
The ring represents the hash space 0–2³². Servers and keys are hashed to ring positions. A key's server is the first server clockwise from its position.
3 servers, 12 virtual nodes, 0 keys
Add keys and see which server they route to. Then add a server — notice only the keys between the new server and its predecessor get remapped. With modulo hashing, all keys would remap.

Modulo Hashing: key % N

Add 1 server (N→N+1): every key that resolves to server numbers ≥ insertion point remaps. In practice, (N-1)/N ≈ 90–100% of keys remap. This causes a massive cache miss storm — every key looks up a cold cache, and your DB gets 10× normal load.

Consistent Hashing: ~1/N remapped

Add a server: only keys between the new server's hash position and its predecessor on the ring remap. For 10 servers, adding 1 remaps ~10% of keys. Cache miss rate stays manageable — no DB stampede. Used by: Cassandra, Redis Cluster, CDNs, DynamoDB.

⚠️

Why Virtual Nodes?

With 3 servers and 3 ring positions, each server might own unequal hash space (33% average but high variance). One server might own 60% of the ring by chance. Virtual nodes (vnodes) give each server 150+ positions, distributing load much more evenly. Cassandra defaults to 256 vnodes per physical node.

🎯 Quick Check: Your Redis cluster has 10 servers using consistent hashing with 150 vnodes each. You add an 11th server. Approximately what percentage of keys get remapped?
04 — Common Pitfall

Session Affinity: When and Why to Avoid It

Affinity MethodMechanismProblemVerdict
IP HashHash client IP → fixed serverCorporate NAT maps millions of users to one IP → hot serverDangerous
Cookie-basedLB sets cookie with server IDBetter than IP, still breaks uniform distributionAcceptable
External Session StoreRedis/memcached stores session; all servers are statelessExtra hop for session lookup (usually <1ms)Recommended
💡

The Right Mental Model

If you need sticky sessions, it's usually a sign that your app stores state on the server (local filesystem, in-memory session, etc.). The fix isn't better affinity — it's making your app stateless. Store sessions in Redis. Store uploads in S3. Make any server able to handle any request. This is what enables auto-scaling.

05 — Operations

Health Checks & Connection Draining

💓

Shallow vs Deep Health Checks

Shallow (TCP connect): Only checks if the process is listening. Misses: app that accepted the TCP connection but can't query its DB, zombie processes that accept connections but time out.

Deep (HTTP /health): App queries all dependencies and returns 200 only if truly healthy. This is what you want in production.

🔌

Connection Draining

When you deregister a server, the LB marks it "draining" — stops sending new requests but lets in-flight requests complete (up to drain timeout, usually 300s). Never kill a server without draining: you'll drop active requests mid-flight. This is the mechanism that makes zero-downtime deploys possible.

💡

What Your /health Endpoint Should Check

Database connection pool: can you acquire a connection and run SELECT 1? Redis/cache: can you ping? External APIs (with timeout): are your critical dependencies up? Return 200 with JSON of dependency statuses. Return 503 if any critical dependency is down. Never return 200 when your app can't actually serve requests — that misleads the LB and causes user-facing errors.