Your first complete system design exercise. Practice the interview format: Requirements → Estimation → Design → Deep Dive. Design bit.ly from scratch, with real numbers and real trade-offs at each layer.
Interviewers expect you to do back-of-envelope math before drawing any boxes. Adjust the sliders to match your assumed scale and read off the derived numbers. 100:1 read:write is typical for URL shorteners.
1) Clarify requirements (functional + non-functional) → 2) Estimate scale (QPS, storage, bandwidth) → 3) High-level design (components) → 4) Deep dive (bottlenecks, trade-offs). Never jump to boxes without numbers first.
Base62 uses digits (0-9), lowercase (a-z), and uppercase (A-Z) — 62 characters total. 6-character codes give 62⁶ = 56,800,235,584 unique combinations. That's 56 billion — enough for decades of URL creation at any realistic scale.
Hash approach: MD5(url)[:6] encoded as base62. Risk: birthday paradox causes collisions — detect with ON CONFLICT and retry with salted hash. Counter approach: global auto-increment. Risk: single point of failure and reveals your volume to competitors. KGS (pre-generated pool) avoids both risks.
Allow users to request custom slugs like sho.rt/my-brand. Validate: 6–50 chars, alphanumeric + hyphens, not in reserved list (admin, api, login, www). Store with is_custom=true flag. Custom aliases never expire unless the user deletes them — unlike auto-generated codes which may have TTL.
The redirect path is the hot path — it must return a 302 response in under 5ms at scale. Every component choice on this path matters. The write path is less latency-sensitive but needs durability.
A common mistake: updating a click counter directly in PostgreSQL on every redirect. At 100K read QPS, that's 100K writes/sec to a single DB row — instant bottleneck. Instead: increment Redis counter (atomic, in-memory), batch-flush to DB every minute via a background job.
URL shortener traffic follows Zipf's law — a small fraction of URLs get the vast majority of clicks. This makes caching extraordinarily effective. Get this right and your DB barely matters for reads.
The most popular URL gets 2× the traffic of the second most popular, 3× the third, etc. In practice: top 20% of URLs receive 80% of all redirect traffic. Cache those 20%, and your cache hit rate exceeds 80% — the DB barely matters for reads.
Least Recently Used (LRU) eviction works perfectly for URL shorteners — if a URL hasn't been accessed recently, it's unlikely to be popular. Set Redis maxmemory-policy allkeys-lru. Cache size: allocate ~20% of daily active URL set. At 500 bytes/URL and 1M daily URLs, that's ~100MB — very cheap.
Default TTL: 24 hours. Long-lived URLs (company links, QR codes): extend to 7 days or infinite (refresh on access). Short campaign URLs: TTL matches campaign end date. Expired URL codes: return 410 Gone, not 404 Not Found — 410 tells crawlers the resource is permanently gone.
For globally popular URLs (viral content), the redirect response itself can be cached at CDN edge nodes. Cache-Control header on 301 redirects (if using 301) lets browsers cache indefinitely. For 302 (analytics-enabled), add Cache-Control: private, no-store to force browser re-request.
URL shorteners are high-value abuse targets — they can be used to hide phishing links, spam, or malware. At scale, you need automated defenses or you'll be deplatformed by browsers and security tools.
Integrate Google Safe Browsing API on URL creation. Check the long URL against the malware/phishing database before generating a short code. Block immediately if flagged. Re-check periodically for URLs that become malicious after creation. Cost: ~$0.80 per 10K lookups.
Unauthenticated: 10 URL creates/hour per IP. Authenticated free tier: 100/day. Paid tier: 10,000/day. Use Redis with sliding window counter. Return 429 Too Many Requests with Retry-After header. Exponential backoff for repeated violations → temporary IP ban.
Block custom aliases that could cause confusion: admin, api, login, signup, www, help, support, blog, brand names (Google, Apple, Amazon). Store as a Redis Set for O(1) lookup. Update via feature flag without deployment.
Show a preview page for suspicious domains before redirect (user explicitly clicks "Continue"). Generate QR codes only for verified/authenticated URLs. Include warning on preview page for known shortener-heavy phishing patterns. Log all preview page visits for abuse analysis.
301 vs 302 trade-off301 Permanent: browser caches redirect forever — reduces your server load but you lose analytics (browser goes direct, bypassing your server). Can't update the destination. 302 Temporary: browser always asks you — full analytics, can change destination, but higher server load. Analytics-first shorteners always use 302.
Set: allkeys-lru, maxmemory 8GB
Schema: urls(code, long_url, created_at, user_id, ttl)
Topic: url.clicks, partition by short code
Only viable with 301 (browser-cached redirects)