Day 22 · Week 4 — Real-World Systems

Design YouTube

Video upload pipelines, transcoding at scale, adaptive bitrate streaming, CDN strategy, and view counting architecture for 2.5 billion monthly users.

7 Stages
~4h Study
2 Simulations
5 Quizzes

The Scale Problem

📤
Video Processing
500 hours uploaded every minute. A 4K video can take 10+ minutes to transcode. Synchronous processing would cause request timeouts — async pipeline is mandatory.
💾
Storage
1B+ videos × 7 quality levels (144p to 4K) × HLS segments = petabytes of object storage. S3/GCS with lifecycle policies moves cold content to cheaper tiers automatically.
🌐
Delivery
2.5B users globally. Raw origin bandwidth would be astronomical. CDN is not optional — it's the architecture. 300+ Points of Presence (PoPs) serve content from the nearest edge.
👁️
View Counting
5B views/day = 57,000 views/second. Writing to a PostgreSQL counter per view causes lock contention. Redis INCR + batch flush is the only viable pattern at this scale.
⚠️
Without architecture: 30-minute synchronous upload processing, request timeouts, server overload. Every 4K upload would hold a worker thread captive for the full transcode duration. YouTube would be unusable within hours of launch.

The Upload Pipeline

ℹ️
How to use: Click "Next Step" to walk through each stage of the upload pipeline. Each stage shows what's happening, why, and the approximate latency at that step.

Adaptive Bitrate (ABR) Quality Selector

Drag the bandwidth slider to simulate what quality level YouTube's player would select. The highlighted bar is the active quality.

144p360p480p720p1080p1440p4K

Adaptive Bitrate Streaming (ABR)

ABR streaming is the technology that lets YouTube automatically switch quality based on your network. The video is pre-encoded at multiple bitrates and split into short segments. The player requests segments one at a time, measuring download speed to determine which quality to request next.

📋
Manifest Files
The .m3u8 (HLS) or .mpd (DASH) file lists all available renditions with their bitrates and segment URLs. The player fetches this first.
✂️
Segment Duration
Segments are typically 2–10 seconds. Quality switches happen at segment boundaries. Shorter segments = faster quality adaptation but more HTTP requests.
📊
Buffer-Based ABR
YouTube's BOLA algorithm: if the buffer is full, request higher quality. If buffer is draining fast, step down. Prioritizes smooth playback over maximum quality.
💡
Why segment duration matters: A 2-second segment means quality can switch every 2 seconds — great for fluctuating networks. A 10-second segment means you're committed to the current quality for 10 seconds. YouTube uses 2–4 second segments for live, 6–10 seconds for VOD.

HLS vs DASH Comparison

Feature HLS (Apple) MPEG-DASH Winner
Full name HTTP Live Streaming Dynamic Adaptive Streaming over HTTP
Manifest format .m3u8 (proprietary) .mpd (XML, open standard) DASH (open)
iOS/Safari support Native Requires library HLS
Android/Chrome Requires library Native (EME/MSE) DASH
Segment format .ts (MPEG-TS) or fMP4 fMP4 / WebM DASH (fMP4 more efficient)
Live streaming Excellent Good Tie
YouTube uses For Apple devices Primary protocol Both (adaptive)

View Count Architecture

The naive approach — incrementing a database counter per view — breaks catastrophically at YouTube's scale. Here's why, and how the correct architecture works.

NAIVE APPROACH (Broken)
User watches UPDATE counter Row lock
At 5B views/day on a viral video: 139 writes/sec on one row. The UPDATE acquires a row-level lock. Queries queue. Timeouts cascade. Database melts.
CORRECT APPROACH
User watches Redis INCR Flush every 30s
Redis INCR is atomic and non-blocking. Each region has its own counter. A background job flushes accumulated counts to PostgreSQL every 30 seconds. No locks, no contention.
PYTHON · view_counter.py
# View counting with Redis batching
async def record_view(video_id: str, user_id: str):
    pipe = redis.pipeline()
    pipe.incr(f"views:{video_id}")          # atomic counter
    pipe.sadd(f"viewers:{video_id}", user_id)  # unique viewers set
    pipe.expire(f"views:{video_id}", 3600)   # 1hr TTL
    await pipe.execute()

# Background flush every 30 seconds
async def flush_view_counts():
    keys = await redis.keys("views:*")
    for key in keys:
        video_id = key.split(":")[1]
        count = int(await redis.getdel(key))
        await db.execute(
            "UPDATE videos SET view_count = view_count + $1 WHERE id = $2",
            count, video_id
        )
        # Also flush unique viewer count to analytics
        unique = await redis.scard(f"viewers:{video_id}")
        await analytics.record(video_id, views=count, unique=unique)
💡
Why eventual consistency is acceptable here: View counts are not financial transactions. Showing "14.2M views" when the true count is 14.3M is perfectly acceptable. Users don't notice. What matters is no data loss — the flush job persists counts durably before TTL expiry.

Technology Decisions

Storage
S3 / GCS + PostgreSQL
Object storage for video files (unlimited scale, cheap). PostgreSQL for metadata (video title, uploader, status). Not MySQL — JSON column support and better concurrency.
Message Queue
Kafka
Not RabbitMQ. Kafka allows multiple consumers (transcoding + thumbnail + content-id + search indexing all consume the same upload event) and replay for debugging failed jobs.
View Counts + Cache
Redis
INCR is atomic and non-blocking. Also caches trending videos and hot video metadata. Memcached not chosen — Redis supports sorted sets for trending leaderboards.
Content Delivery
Akamai / CloudFront
300+ PoPs globally. For popular content, CDN hit rate reaches 99%+. Immutable segment URLs (Cache-Control: max-age=31536000) mean segments never need to be purged.

Quiz

Question 1 of 5
When a user uploads a video, the API should return which HTTP status code while transcoding continues in the background?
A200 OK — the upload is complete
B202 Accepted — the request was received and is being processed
C201 Created — the video was created successfully
D204 No Content — nothing to return yet
Question 2 of 5
YouTube uses HLS/DASH for video delivery. These protocols split video into segments. Why does segment duration (2–10s) matter?
AShorter segments enable better video compression algorithms
BSegment boundaries are where quality switches happen — shorter segments mean faster adaptation to network changes
CLonger segments always reduce server load proportionally
DIt's a licensing requirement from the HLS patent holders
Question 3 of 5
At 5B views/day on viral videos, directly incrementing a PostgreSQL counter (UPDATE videos SET views=views+1) fails because:
APostgreSQL doesn't support integer arithmetic on large numbers
BThe UPDATE acquires a row lock — at 139 writes/sec on one row, queries queue and timeout causing a cascade failure
CView counts are stored as strings and can't be incremented directly
DPostgreSQL replication can't handle this write throughput
Question 4 of 5
Content ID (copyright checking) happens at which stage in YouTube's upload pipeline?
ABefore accepting the upload — to prevent copyrighted content from ever being stored
BAfter transcoding — fingerprint matching runs on the processed video, not during upload
COnly after a video reaches 100 organic views
DDuring live streaming only, not for uploaded videos
Question 5 of 5
For a video with 10M daily views, what CDN cache hit rate should you target?
A50% — many users bypass CDN by refreshing or using VPNs
B80% — typical for popular content across a normal CDN
C99%+ — popular video segments are cached at 300+ edge PoPs with immutable URLs (max-age=1yr)
D0% — videos are personalized per user so they cannot be cached at a shared CDN