Design YouTube

CDN strategy, video transcoding pipelines, adaptive bitrate, and view count architecture at billion-user scale.

4 Exercises
12 Concept Checks
~95 min total
System Design
Session Progress
0 / 4 completed
Exercise 1 🟡 Easy ⏱ 15 min
✓ Completed
Video Upload Decoupling
YouTube receives 500 hours of video per minute. A user uploads a 4GB video file. Without async architecture, the upload API must transcode in-line — causing 30-minute request timeouts and server overload that takes down the entire upload fleet.
Architecture Diagram
👤 Client
📡 Upload API
🪣 S3 Raw
📨 Kafka Queue
🔧 Worker 360p
🔧 Worker 720p
🔧 Worker 1080p
🌐 CDN
Concept Check — 3 questions
Q1. When the upload API writes to S3 and enqueues a Kafka event, what HTTP status should it return to the client?
A200 OK — the upload is done
B202 Accepted — request received, processing in background
C201 Created — the video resource has been created
D204 No Content — the video is ready
Q2. A transcoding worker crashes halfway through a 4K video job. How does Kafka prevent job loss?
AKafka automatically completes the job on another worker
BThe job is permanently lost — Kafka doesn't persist failed jobs
CThe consumer never committed the offset, so Kafka redelivers the message to another worker
DThe API server retries the upload
Q3. How does the uploader know when transcoding is complete?
AThe transcoder sends an email/push notification OR the client polls a /status endpoint
BThe client keeps the upload HTTP connection open until transcoding finishes
CYouTube shows the video immediately during transcoding
DKafka notifies the client directly over TCP
202 Accepted is the right code for async work. Kafka offsets: the consumer commits its offset ONLY after successfully processing — a crash before commit causes re-delivery. Notification: emit a video_ready event from the transcoder; a notification service subscribes and sends the email/push.
Open Design Challenge
1
Design the data model for a video: what fields does the videos table need? Include status, storage path, and quality variants.
2
If the transcoder must produce 360p, 720p, 1080p, and 4K versions — how do you parallelize this across 4 workers for one video?
3
A transcoding job fails 3 times. What is your dead-letter strategy and how do you alert the engineering team?
Concept score: 0/3
Exercise 2 🔴 Medium ⏱ 20 min
✓ Completed
Adaptive Bitrate Streaming (ABR)
A user on mobile opens a YouTube video. Their network speed changes every 30 seconds: 15 Mbps → 1.5 Mbps → 8 Mbps. Without ABR, they would see constant buffering. With ABR, quality adapts automatically — the player switches between pre-transcoded quality levels at segment boundaries.
HLS / DASH Segment Structure
🎞️ 360p segments (500Kbps)
🎞️ 720p segments (2.5Mbps)
🎞️ 1080p segments (8Mbps)
📋 .m3u8 Manifest
📱 Player
monitors bandwidth
⬆/⬇ Switch quality
Concept Check — 3 questions
Q1. The user's bandwidth is 2 Mbps. Which quality level should the ABR player choose?
A1080p (8 Mbps) — always stream the highest quality
B720p (2.5 Mbps) — no, 2 Mbps is below 2.5 Mbps. Choose 360p (500 Kbps) with buffer to spare
C360p (500 Kbps) — safely below the 2 Mbps available bandwidth
DPause and wait until bandwidth reaches 8 Mbps
Q2. HLS splits videos into segments. What is the typical segment duration, and why does it matter for quality switching?
A30 seconds — quality only switches every 30 seconds
B2–10 seconds — the player can switch quality on each segment boundary, adapting quickly
C1 millisecond — for real-time adaptation
DThe entire video file — no splitting
Q3. Where should video segments be stored for best performance? What cache hit rate do you expect for a popular video?
AOrigin servers only — CDNs add too much latency
BOnly in RAM — video files are too small for CDN
CCDN edge nodes globally — popular video segments achieve 99%+ cache hit rate, serving from <50ms away
DUser's local device only
B is correct for Q1: 2 Mbps < 2.5 Mbps needed for 720p, so choose 360p safely. ABR is conservative — it picks quality BELOW available bandwidth to avoid stalling. Segment duration 2-10s lets the player adapt quickly. CDN stores hot segments at 300+ edge PoPs globally.
Open Design Challenge
1
Design the CDN cache key for a 720p segment of video "abc123", minute 3, segment 4. What URL structure guarantees cache hits for the same segment across all users?
2
A new video goes viral 10 minutes after upload. The CDN has 0% cache hit rate. Design "request coalescing" to prevent 100,000 requests hitting the origin simultaneously.
3
How does "predictive preloading" work? When a user is 3 seconds into a video, what segments should already be buffered?
Concept score: 0/3
Exercise 3 🔴 Medium ⏱ 25 min
✓ Completed
View Count at Scale
YouTube serves 1 billion views per day = ~11,600 view events per second. A single PostgreSQL counter column for a viral video receives 500,000 writes in the first hour (139 writes/second on ONE ROW). The database row is locked for each increment, causing writes to queue and timeout.
Counter Architecture Options
❌ NAIVE (breaks at scale)
View event
↓ direct write
UPDATE videos SET views=views+1 WHERE id=X
Row lock contention → 139 writes/sec on 1 row
✅ CORRECT (Redis + batch)
View event
↓ INCR in-memory
Redis: INCR view:{video_id}
↓ batch every 30s
UPDATE videos SET views=views+N
Concept Check — 3 questions
Q1. Using Redis INCR to count views: the server crashes before flushing to PostgreSQL. What is the worst-case data loss?
AAll view counts are lost permanently
BNo data loss — Redis persists everything to disk instantly
CUp to ~30 seconds of view counts are lost — acceptable for a view counter but not for financial data
DPostgreSQL rolls back to a consistent state automatically
Q2. "Sharded counters" split one counter into N shards (e.g., 100 Redis keys for one video). What problem does this solve?
AIt reduces total storage for view counts
BA single Redis key becomes a hot spot at extreme write rates. 100 shards distribute writes — each key gets 1/100th of the traffic. Sum all shards to get total.
CIt makes reads faster by caching the result
DIt prevents duplicate view counts from the same user
Q3. YouTube shows "1.2M views" but the exact count is 1,247,832. This is intentional — why?
AYouTube's engineers can't count accurately at this scale
BExact counts require synchronous writes, which would slow down video loading
CRounding avoids unnecessary DB reads and the precision difference (0.06%) doesn't affect user experience
DExact counts are stored but rounding is a legal requirement
Redis INCR is atomic and returns ~100K ops/sec — far faster than PostgreSQL's row locking. Batch flush every 30s: a cron or Kafka consumer reads GETDEL view:{id} and writes one SQL UPDATE with the accumulated count. Loss on crash = at most 30 seconds of views. For YouTube this is acceptable — it's not money.
Open Design Challenge
1
Design the full view counting pipeline: from browser click → Redis → Kafka → PostgreSQL. Include deduplication (same user shouldn't count twice in 24h).
2
The "views in last 24 hours" feature requires a sliding window count. How do you implement this without scanning billions of rows?
Concept score: 0/3
Exercise 4 🔥 Hard ⏱ 35 min
✓ Completed
Viral Video: Cold Start Problem
A creator uploads a video at 9:00 AM. At 9:05 AM, a celebrity tweets the link. By 9:07 AM, 100,000 users click simultaneously. The CDN has 0% cache hit rate — every request passes through to the origin. 3 origin servers handling video must each serve 33,000 concurrent streams from disk.
Cold Start: Request Coalescing
User 1
User 2
User 3...100K
🔒 CDN Edge
(coalescing hold)
→ single request
📦 Origin
→ cached
CDN stores
→ all 100K served
⚡ Fast delivery
Concept Check — 3 questions
Q1. "CDN Request Coalescing" means 100,000 requests all miss the CDN cache simultaneously. What does the CDN do?
AForwards all 100,000 requests to the origin server simultaneously
BReturns 503 errors until the origin responds
CHolds all 100,000 in a waiting queue, sends ONE request to the origin, and when it responds, serves the cached copy to all 100,000
DRandomly drops 99% of requests and only forwards 1,000
Q2. Virality prediction: a video gets 500 views in its first 60 seconds. What should the system do automatically?
ANothing — wait until the video is popular before caching
BTrigger CDN pre-warming: proactively push all quality segments to edge nodes in top 50 cities before the traffic arrives
CLimit the video to 1,000 viewers until transcoding catches up
DDelete the old video and re-upload it with better compression
Q3. During a viral spike, origin servers are overwhelmed. What should the CDN return if it can't reach the origin?
A503 Service Unavailable immediately
BServe a stale cached version with a "stale-while-revalidate" directive — users get slightly old content rather than an error
CDelete the cache and force all users to wait for a fresh response
DRedirect users to a competitor's video
Request coalescing is a CDN feature: only 1 cache miss travels to origin regardless of how many users miss simultaneously. Pre-warming: Kafka event video_velocity_high (500 views/min) → pre-warming service calls each CDN PoP's edge API to pull-and-cache all segments. Stale-while-revalidate: CDN serves old content immediately while revalidating in the background.
Open Design Challenge
1
Design the virality detection pipeline: from view events → velocity calculation → CDN pre-warm trigger. What Kafka topics and consumers are involved?
2
Pre-warming 300 CDN PoPs × 5 quality levels × 100 segments/level = 150,000 HTTP calls. How do you execute this within 60 seconds without overloading the origin?
3
What is the SLA you'd set for a video to be "CDN warm" after upload? How do you measure it?
Concept score: 0/3