Day 2 Exercises — Caching Architecture & Invalidation

Exercise 1 🟡 Easy ⏱ 15 min

✓ Completed

Write Strategy Selection

An e-commerce product page shows price, inventory, and description. Price changes during flash sales (high write frequency). Description rarely changes. With naive cache, a price update takes 45 seconds to propagate because of a stale cache entry — users see wrong prices and over-purchase.

Cache Write Strategies Compared

Cache-Aside

Read: check cache → miss → read DB → populate cache

Write-Through

Write: update cache + DB simultaneously — always consistent

Write-Behind

Write: update cache only → async flush to DB later

Concept Check — 3 questions

Q1. Price updates must be visible within 1 second. Which write strategy ensures this?

AWrite-behind — update cache asynchronously before flushing to DB

BWrite-through — update both cache and DB synchronously on every write

CCache-aside with 60s TTL — the cache auto-expires in under a minute

DNo cache — always read from DB directly for fresh prices

Q2. Which cache write strategy risks data loss if the cache server crashes before the DB flush?

AWrite-through — both DB and cache are updated synchronously

BCache-aside — the app reads cache then falls back to DB

CWrite-behind — data lives in cache before the async DB flush completes

DRead-through — the cache populates itself from the DB on reads

Q3. Cache-aside pattern: user requests a product page. Cache miss. What are the steps in correct order?

ACheck cache → miss → query DB → write result to cache → return to user

BQuery DB → write to cache → return to user (skip cache check)

CWrite to cache → query DB → return to user

DQuery DB → return to user without caching

Write-through synchronously updates cache + DB — no stale reads. Write-behind is faster (write to cache only, flush async) but risks data loss on crash. TTL strategy: product descriptions 24h, prices 5s or event-driven invalidation, inventory 1s or no cache. Warming: pre-populate cache on deploy to avoid first-user cold start penalty.

Open Design Challenge

1

A flash sale drops price from $99 to $10 for 1 hour. Design cache invalidation so all 5M users see the correct price within 1 second. Show the event flow.

2

Define TTL strategy for: product description (changes rarely), price (changes often), inventory count (changes very often during flash sale).

3

How does cache warming prevent the first user after a deploy from experiencing a cold-start slow response? Describe the warming process.

Concept score: 0/3

Exercise 2 🔴 Medium ⏱ 20 min

✓ Completed

Cache Stampede Prevention

Reddit's front page is cached with a 60s TTL. At T=0, 50,000 concurrent users are viewing the page. At T=60, the cache expires. All 50,000 requests see a cache miss simultaneously and query the database — the thundering herd crashes the DB within 2 seconds.

The Thundering Herd Problem

T=60: cache expires

→

50K concurrent requests

→

all miss → 50K DB queries

→

DB CPU 100% → timeout cascade

Concept Check — 3 questions

Q1. Which technique allows only 1 request to repopulate the cache while all others wait for the result?

AShorter TTL — reducing cache lifetime prevents synchronized expiry

BMutex lock on the cache key — only the lock holder queries DB; others wait for cache population

CAdd a CDN layer in front of the cache

DRate limiting users to 1 request per second each

Q2. Probabilistic Early Rehydration (PER) prevents cache stampede by doing what?

ACaching responses at the CDN edge nodes instead of in Redis

BUsing write-through to keep cache always fresh

CProbabilistically refreshing the cache BEFORE it expires, based on TTL remaining and request rate

DReturning stale data forever without ever refreshing

Q3. The root cause of a cache stampede is?

ASynchronized cache expiry under high concurrency — all replicas expire at the same instant

BToo much data stored in the cache causing memory pressure

CSlow network between cache and application servers

DA poorly designed database schema causing slow queries

Redis lock: SETNX lock:{key} 1 then EXPIRE lock:{key} 5 — only one client acquires it and queries the DB. Others wait or serve stale. Jitter: instead of exact 60s TTL, use 60 ± random(0,15) seconds to spread expiry times across the cluster. Stale-while-revalidate: return the cached value immediately and spawn an async refresh thread.

Open Design Challenge

1

Write the Redis command sequence (SETNX, EXPIRE, GET, DEL) to implement a mutex lock that prevents cache stampede on key "front_page".

2

Add TTL jitter: instead of exact 60s, use 60s ± 15s. Explain mathematically why this prevents synchronized expiry across a 10-node Redis cluster.

3

Design a stale-while-revalidate system: serve stale cache immediately while asynchronously refreshing. How do you prevent multiple concurrent refresh attempts?

Concept score: 0/3

Exercise 3 🔴 Medium ⏱ 25 min

✓ Completed

Cache Invalidation in Microservices

An order service updates an order status. Three downstream services cache order data: a notification service, a dashboard service, and a mobile API. How does the order service invalidate 3 separate caches without creating tight coupling between services?

Event-Driven Cache Invalidation

Order Service

→

writes DB

publishes order_updated → Kafka

→

Notification Cache (subscribes)

Dashboard Cache (subscribes)

Mobile Cache (subscribes)

Concept Check — 3 questions

Q1. What is the cleanest way for the Order service to invalidate caches in other services?

AOrder service directly calls each service's cache invalidation REST API endpoint

BPublish a domain event to a message bus; each service subscribes and self-invalidates its own cache

CUse a shared cache namespace with a global invalidation command that clears all services

DSet very short TTLs (1s) in all downstream caches so they self-expire quickly

Q2. Event-driven cache invalidation introduces which consistency model between services?

AStrong consistency — all caches update atomically with the write

BLinearizability — reads always reflect the latest write globally

CEventual consistency — caches may be stale for milliseconds to seconds after the event

DCausal consistency — caches update in causal order

Q3. A cache invalidation event is lost due to a Kafka partition failure. What do downstream services see?

AThey immediately fall back to the database for fresh data

BStale cached data until the TTL expires or the next successful invalidation event

CThe service crashes due to missing invalidation signal

DAn automatic rollback of the order update in the database

Event schema: {entity_type, entity_id, operation, timestamp, version}. On service restart after downtime: flush all cached entities of the affected type as a recovery strategy. Last-resort TTL: set TTL equal to acceptable max staleness — for order status, 5 minutes is reasonable. On service restart, replay missed events from the Kafka offset where the service last left off.

Open Design Challenge

1

Design the Kafka topic schema for cache invalidation events. What fields are required? Show a JSON example.

2

A service comes back online after 30 minutes of downtime with a stale cache. How do you recover? Describe the replay strategy.

3

What TTL should serve as the "last resort" failsafe if events fail to arrive? Justify your choice for order status data.

Concept score: 0/3

Exercise 4 🔥 Hard ⏱ 30 min

✓ Completed

Multi-Tier Cache Architecture

A news site has 10M articles. Top 1,000 articles get 95% of traffic (power law distribution). Articles have 3 tiers: breaking news (updated every minute), feature articles (updated daily), archived content (never changes). Design a multi-tier caching strategy optimizing for each tier.

Multi-Tier Cache Hierarchy

Browser Cache (L1)

→ miss →

CDN Edge (L2)

→ miss →

Redis (L3)

→ miss →

PostgreSQL (source)

Concept Check — 3 questions

Q1. For archived content that never changes, what is the ideal CDN Cache-Control header?

Ano-cache — always revalidate with the origin

Bmax-age=60 — cache for 1 minute

CCache-Control: public, max-age=31536000, immutable — cache for 1 year, never revalidate

Dprivate — only cache in the browser, not CDN

Q2. Breaking news articles (updated every minute) should use which cache strategy?

ANo cache — breaking news is too dynamic to cache at all

BShort TTL (60s) + event-driven CDN purge when the article is updated

C24h TTL — acceptable staleness for news content

DBrowser cache only — do not use CDN for breaking news

Q3. The correct cache hierarchy hit order for a CDN-served article request is?

ABrowser L1 → CDN Edge L2 → CDN Origin Shield L3 → origin server

BOrigin server → CDN Edge → Browser (data flows outward only)

CRedis → CDN Edge → Browser

DDB → Redis → User directly

Cache key convention: /articles/{id}/v{version} — version bump on update creates a new cache entry. CDN purge: call CDN purge API on publish event; use surrogate/cache-tag keys to purge all quality variants atomically. Storage for top-1,000: 1,000 × 500KB × 5 variants = 2.5GB — fits comfortably in L2 CDN edge cache. Immutable flag tells browsers never to revalidate the file.

Open Design Challenge

1

Design a cache key naming convention for articles that includes content type, article ID, and version. How does versioning enable instant cache busting?

2

When a breaking news article is updated, how do you purge it from CDN in under 5 seconds? Describe the CDN purge API flow including surrogate keys.

3

Calculate CDN storage needed for the top-1,000 articles averaging 500KB each with 5 quality/format variants.

Concept score: 0/3

Day 2 Complete 🎉