Day 2: Caching Architecture & Cache Invalidation

01 — The Core Metric

Cache hit rate is everything

A cache is only useful if most reads come from it, not the database. The difference between 90% and 99% hit rate is a 10× difference in DB load — not 9%. Feel it with the slider below.

Cache size vs working set: 30%

5% (tiny)50% (half working set)95% (nearly all)

Projected metrics (1000 requests)

Cache Hit Rate56%

Cache Miss Rate44%

DB Query Load44% of traffic

560

Cache Hits

440

DB Queries

Request log

Logs appear here after sending requests…

💡

Why the curve is non-linear

Cache hit rate follows a power-law (Zipf) distribution — a small number of keys get most of the traffic. Caching just the top 20% of items often yields 80%+ hit rate. Each percentage point of hit rate at the high end saves exponentially more DB load.

02 — Write Strategies

How does a write propagate through cache + DB?

The three strategies — cache-aside, write-through, and write-behind — differ in who writes what, and when. Step through each write flow to see the exact sequence and where things can go wrong.

Cache-Aside (Lazy Loading): The application manages both cache and DB. On a write, app invalidates cache and writes to DB. On a read miss, app fetches from DB and populates cache.

Press "Next Step" to begin the write scenario.

Write-Through: Every write goes to cache AND DB synchronously. Cache always has fresh data. Writes are slower (two synchronous writes per operation) but reads always get cache-fresh results.

Press "Next Step" to begin the write scenario.

Write-Behind (Write-Back): Write hits cache immediately, DB write is queued to a background worker. Writes feel instant. Risk: if cache crashes before the queue drains, data is lost.

Press "Next Step" to begin the write scenario.

Strategy	Write Latency	Stale Data?	Cache Warm on Write?	Data Loss Risk	Best For
Cache-Aside	Low	Until TTL/miss	No (lazy fill)	None	Read-heavy, tolerate brief stale
Write-Through	High (2 writes)	Never	Yes	None	Consistency-critical reads
Write-Behind	Lowest	Never	Yes	Yes (cache crash)	Write-heavy, eventual durability OK

03 — The Failure Mode

Cache Stampede: when expiry causes collapse

A popular key expires. 500 concurrent requests all miss the cache simultaneously and all fire DB queries at once. The DB collapses. This is the thundering herd problem — watch it unfold.

⚠️

XFetch Algorithm — probabilistic early expiry

Each read checks: now − delta × beta × ln(rand()) > expiry − TTL
delta = how long recomputation takes (seconds). beta = aggressiveness (1.0 = default). If the formula evaluates true, this worker refreshes proactively — even though the key hasn't expired yet. Everyone else still hits the warm cache. One refresh, zero stampede.

💡

Other stampede prevention patterns

Distributed mutex (Redis SET NX): First miss acquires a lock, fetches DB, populates cache. All other waiters block briefly, then retry the cache.
Background async refresh: Never expire keys synchronously — always refresh asynchronously before TTL. Serve slightly stale during refresh.
Request coalescing: Deduplicate in-flight cache miss requests — only one DB query fires per key per time window, all others wait on that result.

04 — TTL & Eviction Policies

How does Redis decide what to remove?

TTL (time-based expiry) and eviction (memory-pressure removal) are separate mechanisms. TTL is predictable; eviction depends on the policy and workload. Choosing wrong costs you cache hit rate.

⏱️

TTL (Time-To-Live)

Keys expire after a set duration: EXPIRE key 300. Redis uses two strategies: lazy expiry (check on access) + active expiry (background scan ~10 keys/100ms). Add jitter: 300 + random(60) to prevent synchronized expiry spikes.

🧹

LRU (Least Recently Used)

On memory pressure, evicts the key not accessed for the longest time. Redis uses probabilistic LRU — samples N random keys, evicts the oldest. Use allkeys-lru for general caches. Increase maxmemory-samples from 5→10 for better accuracy.

📊

LFU (Least Frequently Used)

Evicts the key accessed least often. Better than LRU for bursty workloads where a popular key happens to not be accessed recently (e.g., between traffic spikes). Redis tracks a decaying frequency counter. Use allkeys-lfu.

🚫

volatile-lru vs allkeys-lru

volatile-lru: Only evicts keys with a TTL set. Keys without TTL are never evicted — use when you have both cache keys (with TTL) and permanent keys (no TTL) in the same Redis.
allkeys-lru: Evicts any key. Use when Redis is a pure cache.

05 — Cache Invalidation Patterns

"There are only two hard things in CS…"

Cache invalidation is hard because you need to decide when stale data becomes unacceptable. Three answers: TTL tolerance, event-driven invalidation, and versioned keys. Each has failure modes.

⏰

TTL-Based (Accept Staleness)

Accept stale data for up to TTL seconds. Works when staleness is tolerable — product listings, news feeds, recommendation scores. Not acceptable for bank balances, inventory counts, or anything with financial consequence.

📡

Event-Driven Invalidation

DB write triggers cache delete/update via message queue or CDC (Change Data Capture). Cache is consistent after event propagates (~ms). Problem: at-least-once delivery means you may delete twice — that's fine. Losing the event means stale cache indefinitely — that's not fine.

🔢

Versioned Keys

user:42:v7 — bump the version number on every write. Old versions are simply never read again (TTL-evicted). No explicit invalidation needed. Downside: storage grows until eviction. Cache is never "warmed" — every new version is a cold start.

🌊

Write-Around (Cache Bypass)

On a write, only write to DB — don't touch the cache. Cache fills on next read miss. Avoids polluting cache with write-once data (batch imports, log ingestion). High miss rate initially but prevents cache churn for rarely-re-read data.

🔥

The double-delete pattern (most common production fix for race conditions)

Race condition: Thread A reads from DB (old value) → Thread B deletes cache → Thread A writes old value back to cache → cache is now stale.
Fix — double-delete: Delete cache → write DB → sleep 500ms → delete cache again. The second delete catches any stale write from thread A. Crude but effective in practice.
Cleaner fix: Use Redis Lua scripts for atomic read-modify-write, or optimistic locking with version fields.

Technology Decision

Cache Technology Selection

✅ Redis — choose when...

You need data structures (sorted sets for leaderboards, lists for timelines, pub/sub for invalidation), persistence (RDB/AOF snapshots), Lua scripting, or Cluster mode. Redis is the default choice 95% of the time.

⚠ Memcached — choose when...

You need only simple string key-value, multi-threaded cache server (CPU-heavy workloads), or you're running a legacy system already using it. Memcached is faster for pure cache workloads under heavy multithreaded load, but Redis has closed this gap.

☁️ CDN Cache — choose when...

Serving static assets (images, JS, CSS, videos) or public API responses to a global audience. CDN caches at edge PoPs near users — reduces latency by 10-50×. Never cache user-specific or authenticated content at CDN.

⚡ In-Process Cache — choose when...

Caching within a single server (JVM heap with Caffeine/Guava, Python dict). Zero network overhead (<0.1ms). Use for: feature flags, config, hot lookup tables. Risk: multiple servers = stale state. Sync via Redis pub/sub invalidation.

Interview Answer: "I'd use Redis as the shared distributed cache — it gives me data structures, atomic operations, and pub/sub for cache invalidation. For static assets I'd add a CDN layer in front. For extremely hot config data I'd add an in-process L1 cache synced via Redis pub/sub."

Quiz — Check Your Understanding

Caching Architecture

Q1. Your service handles user profile reads (90% reads, 10% writes). A write must be immediately visible to the next read — no staleness allowed. Which strategy fits?

Q2. Redis is configured with maxmemory-policy volatile-lru. Cache is full. You SET a new key without an EXPIRE. What happens?

Q3. At 3am, a hot key expires. 800 concurrent requests all miss the cache and simultaneously hit the DB. The DB query takes 4 seconds. What is this and what's the best mitigation?

Caching Architecture &Cache Invalidation

Cache hit rate is everything

Why the curve is non-linear

How does a write propagate through cache + DB?

Cache Stampede: when expiry causes collapse

XFetch Algorithm — probabilistic early expiry

Other stampede prevention patterns

How does Redis decide what to remove?

TTL (Time-To-Live)

LRU (Least Recently Used)

LFU (Least Frequently Used)

volatile-lru vs allkeys-lru

"There are only two hard things in CS…"

TTL-Based (Accept Staleness)

Event-Driven Invalidation

Versioned Keys

Write-Around (Cache Bypass)

The double-delete pattern (most common production fix for race conditions)

Cache Technology Selection

Caching Architecture

Caching Architecture &
Cache Invalidation