Day 6: Service Discovery & API Gateway

01 — The Problem: Service Location

Where do services find each other?

In a monolith, a function call is a memory address. In microservices, it's a network call to an address that might change at any second — containers restart, autoscalers add instances, deployments roll pods. Static configuration breaks immediately.

🔥

Hard-coded IPs: The Anti-Pattern

Hardcoding 10.0.1.45:8080 into your config breaks the moment a container restarts. In Kubernetes, every pod restart gets a new IP. At Netflix's scale (700+ services), this approach is a deployment nightmare — each service change requires updating every caller.

🌐

DNS-based Discovery

Assign each service a stable DNS name (user-service.internal). Works natively in Kubernetes where CoreDNS handles it automatically. Limitation: DNS TTL means stale records during rollouts. Not suitable for sub-second failover without careful TTL tuning (set to 5–10s).

📋

Service Registry (Consul / Eureka)

A dedicated store that maps service_name → [{ip, port, health}]. Services register on startup and send heartbeats every 10s. Registry deregisters unhealthy instances within 30s. Enables real-time routing decisions without DNS TTL constraints. Consul adds KV store and multi-datacenter support.

⚖️

Client-side vs Server-side Discovery

Client-side (Eureka): The calling service queries the registry, picks an instance, calls it directly. Gives the client full control over load balancing strategy. Server-side (AWS ALB): Client calls a stable LB endpoint; LB queries the registry. Simpler clients, but adds a network hop and a centralized component.

Discovery Mechanism Comparison

Mechanism	Latency	Stale data risk	Kubernetes-native	Multi-datacenter	Best for
DNS	~1ms	Medium (TTL)	Yes	Limited	Simple setups, K8s
Consul	~5ms	Low (10s heartbeat)	External	Yes	Multi-DC, health checks
Eureka	~5ms	Low-Medium	External	No	AWS/Spring ecosystem
K8s built-in	~1ms	Very low	Native	No	Single cluster deployments

💡

The health check interval matters

Most registries use: check every 10–30s, 3 consecutive failures = remove from registry. This means worst-case 60–90s before a dead instance stops receiving traffic. Use a smaller interval (5s) for latency-sensitive services, but at the cost of higher registry load.

02 — Circuit Breaker State Machine

Three states, one goal: stop cascading failures

When a downstream service starts failing, a circuit breaker trips OPEN — fast-failing all requests without hitting the downstream. After a cooldown window, it enters HALF-OPEN: one probe request is allowed through. Success closes it; failure reopens it immediately.

CLOSED

Normal operation. All requests pass through. Failures being counted.

OPEN

Fast-fail. No downstream calls made. Cooldown active (10s demo).

HALF-OPEN

One probe allowed. Success → CLOSED. Failure → OPEN again.

          State: CLOSED
        

Failures: 0 / 5

⚠️

Why HALF-OPEN matters

Without HALF-OPEN, a circuit that goes OPEN stays open forever (requires manual reset) or toggles wildly between OPEN and CLOSED. HALF-OPEN is the "cautious probe" — it lets you verify the downstream recovered without risking a flood of requests.

Production Circuit Breaker (Python / Resilience4j style)

import time
from enum import Enum
from threading import Lock

class State(Enum):
    CLOSED    = "closed"
    OPEN      = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5, cooldown_secs=30, success_threshold=2):
        self.state             = State.CLOSED
        self.failure_count     = 0
        self.success_count     = 0
        self.failure_threshold = failure_threshold
        self.success_threshold = success_threshold
        self.cooldown_secs     = cooldown_secs
        self.last_failure_time = None
        self._lock             = Lock()

    def call(self, func, *args, **kwargs):
        with self._lock:
            if self.state == State.OPEN:
                elapsed = time.time() - self.last_failure_time
                if elapsed >= self.cooldown_secs:
                    self.state = State.HALF_OPEN   # let one probe through
                else:
                    raise CircuitOpenError(
                        f"Circuit OPEN. Retry in {self.cooldown_secs - elapsed:.0f}s"
                    )
        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception:
            self._on_failure()
            raise

    def _on_success(self):
        with self._lock:
            if self.state == State.HALF_OPEN:
                self.success_count += 1
                if self.success_count >= self.success_threshold:
                    self.state = State.CLOSED
                    self.failure_count = self.success_count = 0
            elif self.state == State.CLOSED:
                self.failure_count = 0     # reset on success

    def _on_failure(self):
        with self._lock:
            self.failure_count    += 1
            self.last_failure_time = time.time()
            if self.failure_count >= self.failure_threshold \
               or self.state == State.HALF_OPEN:
                self.state         = State.OPEN
                self.success_count = 0

class CircuitOpenError(Exception):
    pass

# Per-route circuit breakers in an API gateway
breakers = {
    "/payments":  CircuitBreaker(failure_threshold=5,  cooldown_secs=30),
    "/inventory": CircuitBreaker(failure_threshold=10, cooldown_secs=60),
    "/users":     CircuitBreaker(failure_threshold=20, cooldown_secs=10),
}

03 — API Gateway Deep Dive

The front door to your microservices

An API Gateway is the single entry point for all external traffic. It handles cross-cutting concerns so individual services don't have to. Every major platform uses one: AWS API Gateway, Kong, Nginx, Envoy, or custom-built.

Mobile App

Web Browser

Partner API

↓ HTTPS

API Gateway
Kong / Nginx / AWS

↓ Routes internally

User Service
:8001

Order Service
:8002

Payment Service
:8003

Search Service
:8004

↓ Reads/writes

PostgreSQL

Redis Cache

Kafka

What the Gateway Handles

🔐
AuthenticationValidate JWT tokens, API keys, OAuth2 — before the request ever reaches your service
🚦
Rate Limiting100 req/s per IP, 1000 req/min per API key — enforced centrally so no service implements it individually
🗺️
Request RoutingPath-based routing: /api/v1/users/* → user-service, /api/v1/orders/* → order-service
🔒
SSL TerminationHTTPS terminates at the gateway; internal traffic uses HTTP or mTLS for service-to-service
🔄
Request TransformationAdd headers (X-User-ID, X-Request-ID), strip sensitive headers, transform payloads for backward compatibility
📊
ObservabilityCentralized access logs, distributed tracing injection (X-B3-TraceId), latency metrics per endpoint

Nginx Gateway Config

Rate limiting + JWT auth + upstream routing pattern:

upstream user-service {
    server user-svc-1:8080;
    server user-svc-2:8080;
    keepalive 32;             # reuse connections — critical for performance
}

# Rate limit zone: 10MB stores ~160,000 IP counters
limit_req_zone $binary_remote_addr zone=api:10m rate=100r/s;

server {
    listen 443 ssl http2;
    server_name api.example.com;

    ssl_certificate     /certs/fullchain.pem;
    ssl_certificate_key /certs/privkey.pem;
    ssl_protocols       TLSv1.2 TLSv1.3;

    location /api/v1/users {
        # Rate limit: allow burst of 200 with no delay for bursts within limit
        limit_req zone=api burst=200 nodelay;

        # JWT validation via auth_request sub-request
        auth_request /auth/validate;
        auth_request_set $user_id $upstream_http_x_user_id;

        proxy_pass http://user-service;
        proxy_set_header X-User-ID    $user_id;
        proxy_set_header X-Request-ID $request_id;  # distributed tracing
        proxy_set_header X-Real-IP    $remote_addr;

        # Timeouts — tune per service SLA
        proxy_connect_timeout 5s;
        proxy_send_timeout    30s;
        proxy_read_timeout    30s;
    }

    # Internal auth validation endpoint
    location = /auth/validate {
        internal;
        proxy_pass http://auth-service/validate;
        proxy_pass_request_body off;
        proxy_set_header Content-Length "";
        proxy_set_header X-Original-URI $request_uri;
    }
}

04 — Service Mesh vs API Gateway

They solve different problems — you often need both

The most common interview mistake: conflating these two. API Gateway handles north-south traffic (external users → your services). Service Mesh handles east-west traffic (service → service internal calls). Different scopes, different deployment models.

Feature	API Gateway	Service Mesh (Istio/Linkerd)
Traffic direction	North-south (external → internal)	East-west (service → service)
Deployment model	Centralized proxy (single point)	Sidecar proxy per service pod
mTLS	Optional (terminate at gateway)	Automatic, zero-config
Service discovery	Manual routing rules	Automatic (via control plane)
Observability	Request-level metrics at edge	Service-to-service metrics, traces
Circuit breaking	Per-upstream config	Automatic per destination rule
Overhead	Low (one hop)	Higher (~1-3ms per hop, sidecar CPU)
Examples	Kong, Nginx, AWS API GW, Envoy	Istio, Linkerd, Consul Connect

✅

The typical large-scale setup

Use both: API Gateway for external auth, rate limiting, and routing. Service Mesh for internal mTLS, east-west observability, and automatic circuit breaking between services. The sidecar overhead is worth it at 20+ services with compliance requirements.

05 — Technology Decisions

When to use which solution

The right answer depends on your team's complexity, scale, and operational maturity. Here are the decision boundaries that appear in system design interviews.

Use API Gateway when

External Traffic Entry

You need centralized auth for all external clients
Rate limiting should be enforced before hitting services
SSL termination at a single point
API versioning and backward compatibility

Examples: Kong, AWS API GW, Nginx, Apigee

Use Service Mesh when

Internal Service Security

20+ microservices with complex inter-service calls
Compliance requires mTLS for all internal communication
You need east-west observability without code changes
Automatic retry and circuit breaking between services

Examples: Istio, Linkerd, Consul Connect

Use Consul when

Multi-datacenter Discovery

Services span multiple datacenters or cloud regions
Need health checking with custom scripts
Want KV store alongside service registry
Not fully committed to Kubernetes

Best for: hybrid cloud, multi-region setups

Use K8s built-in when

Single-cluster Simplicity

Running entirely within one Kubernetes cluster
CoreDNS + Services cover your discovery needs
Team already knows Kubernetes primitives
Want to minimize external infrastructure dependencies

Use: CoreDNS, K8s Services, Ingress controller

06 — Knowledge Check

Five questions on service discovery & API gateway

1. What is the key difference between an API Gateway and a load balancer?

A load balancer distributes traffic across homogeneous instances of a single service. An API Gateway operates at Layer 7 and adds application-layer logic: it knows about your API (routes, authentication, rate limits, transformations). You often use both together: LB in front of the gateway, gateway routing to upstream LBs.

2. A circuit breaker is in OPEN state. What happens to incoming requests?

OPEN state means "fail fast" — the circuit breaker returns an error immediately, without making a network call to the failing downstream. This prevents the caller from wasting threads/connections waiting for timeouts, which would eventually exhaust the connection pool and cascade failures upstream. The ~1ms fail-fast is far better than a 30s timeout.

3. When does the overhead of a service mesh (sidecar proxies) become justified?

Each sidecar proxy adds ~1-3ms latency and CPU/memory overhead per service. For 5 services, the operational cost (managing Istio's control plane) outweighs the benefits. At 20+ services with PCI-DSS or SOC2 compliance requiring encrypted service-to-service traffic and full audit logs, the sidecar overhead is clearly worth it.

4. In a service registry, what typically determines when an instance is removed from rotation?

Most registries require multiple consecutive failures before deregistering to avoid false positives from transient network issues. Consul default: 30s TTL, 3 failures. Eureka: 90s eviction timeout. This means worst-case, a dead instance stays in the registry for 90s. Design your clients to retry on failure and handle the occasional dead-endpoint response.

5. Which rate limiting algorithm produces the smoothest output (no burst spikes) at the cost of some request delay?

Leaky bucket queues requests and drains them at a fixed rate — like water draining from a bucket. Output is perfectly smooth regardless of input burst patterns. Token bucket allows bursts up to the bucket capacity (useful for bursty-but-average-compliant traffic). Leaky bucket is preferred when downstream services need a constant, predictable request rate (e.g., external payment APIs with strict rate limits).