Day 6 · Week 1

Service Discovery &
API Gateway

Understand how microservices find each other at runtime. Build a live circuit breaker simulation. Learn the precise difference between API Gateway and Service Mesh — and when each applies.

3
Simulations
~3.5h
Study Time
5
Quizzes
Consul Eureka Circuit Breaker API Gateway Service Mesh
01 — The Problem: Service Location

Where do services find each other?

In a monolith, a function call is a memory address. In microservices, it's a network call to an address that might change at any second — containers restart, autoscalers add instances, deployments roll pods. Static configuration breaks immediately.

🔥

Hard-coded IPs: The Anti-Pattern

Hardcoding 10.0.1.45:8080 into your config breaks the moment a container restarts. In Kubernetes, every pod restart gets a new IP. At Netflix's scale (700+ services), this approach is a deployment nightmare — each service change requires updating every caller.

🌐

DNS-based Discovery

Assign each service a stable DNS name (user-service.internal). Works natively in Kubernetes where CoreDNS handles it automatically. Limitation: DNS TTL means stale records during rollouts. Not suitable for sub-second failover without careful TTL tuning (set to 5–10s).

📋

Service Registry (Consul / Eureka)

A dedicated store that maps service_name → [{ip, port, health}]. Services register on startup and send heartbeats every 10s. Registry deregisters unhealthy instances within 30s. Enables real-time routing decisions without DNS TTL constraints. Consul adds KV store and multi-datacenter support.

⚖️

Client-side vs Server-side Discovery

Client-side (Eureka): The calling service queries the registry, picks an instance, calls it directly. Gives the client full control over load balancing strategy. Server-side (AWS ALB): Client calls a stable LB endpoint; LB queries the registry. Simpler clients, but adds a network hop and a centralized component.

Discovery Mechanism Comparison

MechanismLatencyStale data riskKubernetes-nativeMulti-datacenterBest for
DNS~1msMedium (TTL)YesLimitedSimple setups, K8s
Consul~5msLow (10s heartbeat)ExternalYesMulti-DC, health checks
Eureka~5msLow-MediumExternalNoAWS/Spring ecosystem
K8s built-in~1msVery lowNativeNoSingle cluster deployments
💡

The health check interval matters

Most registries use: check every 10–30s, 3 consecutive failures = remove from registry. This means worst-case 60–90s before a dead instance stops receiving traffic. Use a smaller interval (5s) for latency-sensitive services, but at the cost of higher registry load.

02 — Circuit Breaker State Machine

Three states, one goal: stop cascading failures

When a downstream service starts failing, a circuit breaker trips OPEN — fast-failing all requests without hitting the downstream. After a cooldown window, it enters HALF-OPEN: one probe request is allowed through. Success closes it; failure reopens it immediately.

⚡ Live Circuit Breaker Simulator
Inject failures until the circuit trips (threshold: 5). Watch the cooldown. Then probe it back to CLOSED.

CLOSED

Normal operation. All requests pass through. Failures being counted.

OPEN

Fast-fail. No downstream calls made. Cooldown active (10s demo).

HALF-OPEN

One probe allowed. Success → CLOSED. Failure → OPEN again.

State: CLOSED
Failures: 0 / 5
⚠️

Why HALF-OPEN matters

Without HALF-OPEN, a circuit that goes OPEN stays open forever (requires manual reset) or toggles wildly between OPEN and CLOSED. HALF-OPEN is the "cautious probe" — it lets you verify the downstream recovered without risking a flood of requests.

Production Circuit Breaker (Python / Resilience4j style)

import time from enum import Enum from threading import Lock class State(Enum): CLOSED = "closed" OPEN = "open" HALF_OPEN = "half_open" class CircuitBreaker: def __init__(self, failure_threshold=5, cooldown_secs=30, success_threshold=2): self.state = State.CLOSED self.failure_count = 0 self.success_count = 0 self.failure_threshold = failure_threshold self.success_threshold = success_threshold self.cooldown_secs = cooldown_secs self.last_failure_time = None self._lock = Lock() def call(self, func, *args, **kwargs): with self._lock: if self.state == State.OPEN: elapsed = time.time() - self.last_failure_time if elapsed >= self.cooldown_secs: self.state = State.HALF_OPEN # let one probe through else: raise CircuitOpenError( f"Circuit OPEN. Retry in {self.cooldown_secs - elapsed:.0f}s" ) try: result = func(*args, **kwargs) self._on_success() return result except Exception: self._on_failure() raise def _on_success(self): with self._lock: if self.state == State.HALF_OPEN: self.success_count += 1 if self.success_count >= self.success_threshold: self.state = State.CLOSED self.failure_count = self.success_count = 0 elif self.state == State.CLOSED: self.failure_count = 0 # reset on success def _on_failure(self): with self._lock: self.failure_count += 1 self.last_failure_time = time.time() if self.failure_count >= self.failure_threshold \ or self.state == State.HALF_OPEN: self.state = State.OPEN self.success_count = 0 class CircuitOpenError(Exception): pass # Per-route circuit breakers in an API gateway breakers = { "/payments": CircuitBreaker(failure_threshold=5, cooldown_secs=30), "/inventory": CircuitBreaker(failure_threshold=10, cooldown_secs=60), "/users": CircuitBreaker(failure_threshold=20, cooldown_secs=10), }
03 — API Gateway Deep Dive

The front door to your microservices

An API Gateway is the single entry point for all external traffic. It handles cross-cutting concerns so individual services don't have to. Every major platform uses one: AWS API Gateway, Kong, Nginx, Envoy, or custom-built.

Mobile App
Web Browser
Partner API
↓ HTTPS
API Gateway
Kong / Nginx / AWS
↓ Routes internally
User Service
:8001
Order Service
:8002
Payment Service
:8003
Search Service
:8004
↓ Reads/writes
PostgreSQL
Redis Cache
Kafka

What the Gateway Handles

Nginx Gateway Config

Rate limiting + JWT auth + upstream routing pattern:

upstream user-service { server user-svc-1:8080; server user-svc-2:8080; keepalive 32; # reuse connections — critical for performance } # Rate limit zone: 10MB stores ~160,000 IP counters limit_req_zone $binary_remote_addr zone=api:10m rate=100r/s; server { listen 443 ssl http2; server_name api.example.com; ssl_certificate /certs/fullchain.pem; ssl_certificate_key /certs/privkey.pem; ssl_protocols TLSv1.2 TLSv1.3; location /api/v1/users { # Rate limit: allow burst of 200 with no delay for bursts within limit limit_req zone=api burst=200 nodelay; # JWT validation via auth_request sub-request auth_request /auth/validate; auth_request_set $user_id $upstream_http_x_user_id; proxy_pass http://user-service; proxy_set_header X-User-ID $user_id; proxy_set_header X-Request-ID $request_id; # distributed tracing proxy_set_header X-Real-IP $remote_addr; # Timeouts — tune per service SLA proxy_connect_timeout 5s; proxy_send_timeout 30s; proxy_read_timeout 30s; } # Internal auth validation endpoint location = /auth/validate { internal; proxy_pass http://auth-service/validate; proxy_pass_request_body off; proxy_set_header Content-Length ""; proxy_set_header X-Original-URI $request_uri; } }
04 — Service Mesh vs API Gateway

They solve different problems — you often need both

The most common interview mistake: conflating these two. API Gateway handles north-south traffic (external users → your services). Service Mesh handles east-west traffic (service → service internal calls). Different scopes, different deployment models.

FeatureAPI GatewayService Mesh (Istio/Linkerd)
Traffic directionNorth-south (external → internal)East-west (service → service)
Deployment modelCentralized proxy (single point)Sidecar proxy per service pod
mTLSOptional (terminate at gateway)Automatic, zero-config
Service discoveryManual routing rulesAutomatic (via control plane)
ObservabilityRequest-level metrics at edgeService-to-service metrics, traces
Circuit breakingPer-upstream configAutomatic per destination rule
OverheadLow (one hop)Higher (~1-3ms per hop, sidecar CPU)
ExamplesKong, Nginx, AWS API GW, EnvoyIstio, Linkerd, Consul Connect

The typical large-scale setup

Use both: API Gateway for external auth, rate limiting, and routing. Service Mesh for internal mTLS, east-west observability, and automatic circuit breaking between services. The sidecar overhead is worth it at 20+ services with compliance requirements.

05 — Technology Decisions

When to use which solution

The right answer depends on your team's complexity, scale, and operational maturity. Here are the decision boundaries that appear in system design interviews.

Use API Gateway when

External Traffic Entry

  • You need centralized auth for all external clients
  • Rate limiting should be enforced before hitting services
  • SSL termination at a single point
  • API versioning and backward compatibility

Examples: Kong, AWS API GW, Nginx, Apigee

Use Service Mesh when

Internal Service Security

  • 20+ microservices with complex inter-service calls
  • Compliance requires mTLS for all internal communication
  • You need east-west observability without code changes
  • Automatic retry and circuit breaking between services

Examples: Istio, Linkerd, Consul Connect

Use Consul when

Multi-datacenter Discovery

  • Services span multiple datacenters or cloud regions
  • Need health checking with custom scripts
  • Want KV store alongside service registry
  • Not fully committed to Kubernetes

Best for: hybrid cloud, multi-region setups

Use K8s built-in when

Single-cluster Simplicity

  • Running entirely within one Kubernetes cluster
  • CoreDNS + Services cover your discovery needs
  • Team already knows Kubernetes primitives
  • Want to minimize external infrastructure dependencies

Use: CoreDNS, K8s Services, Ingress controller

06 — Knowledge Check

Five questions on service discovery & API gateway

1. What is the key difference between an API Gateway and a load balancer?
A load balancer distributes traffic across homogeneous instances of a single service. An API Gateway operates at Layer 7 and adds application-layer logic: it knows about your API (routes, authentication, rate limits, transformations). You often use both together: LB in front of the gateway, gateway routing to upstream LBs.
2. A circuit breaker is in OPEN state. What happens to incoming requests?
OPEN state means "fail fast" — the circuit breaker returns an error immediately, without making a network call to the failing downstream. This prevents the caller from wasting threads/connections waiting for timeouts, which would eventually exhaust the connection pool and cascade failures upstream. The ~1ms fail-fast is far better than a 30s timeout.
3. When does the overhead of a service mesh (sidecar proxies) become justified?
Each sidecar proxy adds ~1-3ms latency and CPU/memory overhead per service. For 5 services, the operational cost (managing Istio's control plane) outweighs the benefits. At 20+ services with PCI-DSS or SOC2 compliance requiring encrypted service-to-service traffic and full audit logs, the sidecar overhead is clearly worth it.
4. In a service registry, what typically determines when an instance is removed from rotation?
Most registries require multiple consecutive failures before deregistering to avoid false positives from transient network issues. Consul default: 30s TTL, 3 failures. Eureka: 90s eviction timeout. This means worst-case, a dead instance stays in the registry for 90s. Design your clients to retry on failure and handle the occasional dead-endpoint response.
5. Which rate limiting algorithm produces the smoothest output (no burst spikes) at the cost of some request delay?
Leaky bucket queues requests and drains them at a fixed rate — like water draining from a bucket. Output is perfectly smooth regardless of input burst patterns. Token bucket allows bursts up to the bucket capacity (useful for bursty-but-average-compliant traffic). Leaky bucket is preferred when downstream services need a constant, predictable request rate (e.g., external payment APIs with strict rate limits).