Week 1 · Day 2

Caching Architecture &
Cache Invalidation

Manipulate cache size to see hit rates change in real time. Walk through write strategies step-by-step. Watch cache stampedes unfold — then see XFetch stop them cold.

3
Simulations
~4h
Study Time
3
Quizzes
Cache-Aside Write-Through Write-Behind Cache Stampede TTL & XFetch Thundering Herd
01 — The Core Metric

Cache hit rate is everything

A cache is only useful if most reads come from it, not the database. The difference between 90% and 99% hit rate is a 10× difference in DB load — not 9%. Feel it with the slider below.

📊 Cache Hit Rate Simulator
Drag the slider to change cache size (as % of working set). Send requests to see live hit/miss counts and the DB load impact.
5% (tiny)50% (half working set)95% (nearly all)

Projected metrics (1000 requests)

Cache Hit Rate56%
Cache Miss Rate44%
DB Query Load44% of traffic
560
Cache Hits
440
DB Queries

Request log

Logs appear here after sending requests…
💡

Why the curve is non-linear

Cache hit rate follows a power-law (Zipf) distribution — a small number of keys get most of the traffic. Caching just the top 20% of items often yields 80%+ hit rate. Each percentage point of hit rate at the high end saves exponentially more DB load.

02 — Write Strategies

How does a write propagate through cache + DB?

The three strategies — cache-aside, write-through, and write-behind — differ in who writes what, and when. Step through each write flow to see the exact sequence and where things can go wrong.

🔄 Strategy Step-Through Visualizer
Select a strategy tab, then press Next Step to advance through the write + read flow one step at a time.

Cache-Aside (Lazy Loading): The application manages both cache and DB. On a write, app invalidates cache and writes to DB. On a read miss, app fetches from DB and populates cache.

Press "Next Step" to begin the write scenario.

Write-Through: Every write goes to cache AND DB synchronously. Cache always has fresh data. Writes are slower (two synchronous writes per operation) but reads always get cache-fresh results.

Press "Next Step" to begin the write scenario.

Write-Behind (Write-Back): Write hits cache immediately, DB write is queued to a background worker. Writes feel instant. Risk: if cache crashes before the queue drains, data is lost.

Press "Next Step" to begin the write scenario.
StrategyWrite LatencyStale Data?Cache Warm on Write?Data Loss RiskBest For
Cache-AsideLowUntil TTL/missNo (lazy fill)NoneRead-heavy, tolerate brief stale
Write-ThroughHigh (2 writes)NeverYesNoneConsistency-critical reads
Write-BehindLowestNeverYesYes (cache crash)Write-heavy, eventual durability OK
03 — The Failure Mode

Cache Stampede: when expiry causes collapse

A popular key expires. 500 concurrent requests all miss the cache simultaneously and all fire DB queries at once. The DB collapses. This is the thundering herd problem — watch it unfold.

⚡ Stampede Visualizer
Compare unprotected cache expiry vs XFetch probabilistic early refresh. One generates 1 DB query; the other generates hundreds.
Click a button above. No Protection: all clients fire DB queries simultaneously on expiry. XFetch: one client proactively refreshes before expiry — others never see a miss.
⚠️

XFetch Algorithm — probabilistic early expiry

Each read checks: now − delta × beta × ln(rand()) > expiry − TTL
delta = how long recomputation takes (seconds). beta = aggressiveness (1.0 = default). If the formula evaluates true, this worker refreshes proactively — even though the key hasn't expired yet. Everyone else still hits the warm cache. One refresh, zero stampede.

💡

Other stampede prevention patterns

Distributed mutex (Redis SET NX): First miss acquires a lock, fetches DB, populates cache. All other waiters block briefly, then retry the cache.
Background async refresh: Never expire keys synchronously — always refresh asynchronously before TTL. Serve slightly stale during refresh.
Request coalescing: Deduplicate in-flight cache miss requests — only one DB query fires per key per time window, all others wait on that result.

04 — TTL & Eviction Policies

How does Redis decide what to remove?

TTL (time-based expiry) and eviction (memory-pressure removal) are separate mechanisms. TTL is predictable; eviction depends on the policy and workload. Choosing wrong costs you cache hit rate.

⏱️

TTL (Time-To-Live)

Keys expire after a set duration: EXPIRE key 300. Redis uses two strategies: lazy expiry (check on access) + active expiry (background scan ~10 keys/100ms). Add jitter: 300 + random(60) to prevent synchronized expiry spikes.

🧹

LRU (Least Recently Used)

On memory pressure, evicts the key not accessed for the longest time. Redis uses probabilistic LRU — samples N random keys, evicts the oldest. Use allkeys-lru for general caches. Increase maxmemory-samples from 5→10 for better accuracy.

📊

LFU (Least Frequently Used)

Evicts the key accessed least often. Better than LRU for bursty workloads where a popular key happens to not be accessed recently (e.g., between traffic spikes). Redis tracks a decaying frequency counter. Use allkeys-lfu.

🚫

volatile-lru vs allkeys-lru

volatile-lru: Only evicts keys with a TTL set. Keys without TTL are never evicted — use when you have both cache keys (with TTL) and permanent keys (no TTL) in the same Redis.
allkeys-lru: Evicts any key. Use when Redis is a pure cache.

05 — Cache Invalidation Patterns

"There are only two hard things in CS…"

Cache invalidation is hard because you need to decide when stale data becomes unacceptable. Three answers: TTL tolerance, event-driven invalidation, and versioned keys. Each has failure modes.

TTL-Based (Accept Staleness)

Accept stale data for up to TTL seconds. Works when staleness is tolerable — product listings, news feeds, recommendation scores. Not acceptable for bank balances, inventory counts, or anything with financial consequence.

📡

Event-Driven Invalidation

DB write triggers cache delete/update via message queue or CDC (Change Data Capture). Cache is consistent after event propagates (~ms). Problem: at-least-once delivery means you may delete twice — that's fine. Losing the event means stale cache indefinitely — that's not fine.

🔢

Versioned Keys

user:42:v7 — bump the version number on every write. Old versions are simply never read again (TTL-evicted). No explicit invalidation needed. Downside: storage grows until eviction. Cache is never "warmed" — every new version is a cold start.

🌊

Write-Around (Cache Bypass)

On a write, only write to DB — don't touch the cache. Cache fills on next read miss. Avoids polluting cache with write-once data (batch imports, log ingestion). High miss rate initially but prevents cache churn for rarely-re-read data.

🔥

The double-delete pattern (most common production fix for race conditions)

Race condition: Thread A reads from DB (old value) → Thread B deletes cache → Thread A writes old value back to cache → cache is now stale.
Fix — double-delete: Delete cache → write DB → sleep 500ms → delete cache again. The second delete catches any stale write from thread A. Crude but effective in practice.
Cleaner fix: Use Redis Lua scripts for atomic read-modify-write, or optimistic locking with version fields.

Cache Technology Selection

✅ Redis — choose when...
You need data structures (sorted sets for leaderboards, lists for timelines, pub/sub for invalidation), persistence (RDB/AOF snapshots), Lua scripting, or Cluster mode. Redis is the default choice 95% of the time.
⚠ Memcached — choose when...
You need only simple string key-value, multi-threaded cache server (CPU-heavy workloads), or you're running a legacy system already using it. Memcached is faster for pure cache workloads under heavy multithreaded load, but Redis has closed this gap.
☁️ CDN Cache — choose when...
Serving static assets (images, JS, CSS, videos) or public API responses to a global audience. CDN caches at edge PoPs near users — reduces latency by 10-50×. Never cache user-specific or authenticated content at CDN.
⚡ In-Process Cache — choose when...
Caching within a single server (JVM heap with Caffeine/Guava, Python dict). Zero network overhead (<0.1ms). Use for: feature flags, config, hot lookup tables. Risk: multiple servers = stale state. Sync via Redis pub/sub invalidation.
Interview Answer: "I'd use Redis as the shared distributed cache — it gives me data structures, atomic operations, and pub/sub for cache invalidation. For static assets I'd add a CDN layer in front. For extremely hot config data I'd add an in-process L1 cache synced via Redis pub/sub."
Quiz — Check Your Understanding

Caching Architecture

Q1. Your service handles user profile reads (90% reads, 10% writes). A write must be immediately visible to the next read — no staleness allowed. Which strategy fits?
Q2. Redis is configured with maxmemory-policy volatile-lru. Cache is full. You SET a new key without an EXPIRE. What happens?
Q3. At 3am, a hot key expires. 800 concurrent requests all miss the cache and simultaneously hit the DB. The DB query takes 4 seconds. What is this and what's the best mitigation?