Day 24 Exercises — Design Google Search

Exercise 1🟡 Easy15 min

Inverted Index Construction

You have 5 web pages and need to build an inverted index. Doc 1: "cat and dog", Doc 2: "dog and fish", Doc 3: "cat fish food".

Tasks

Build the complete inverted index (term → list of doc IDs).
What is a "posting list" and what additional information does it store beyond doc IDs?
Run a boolean AND query for "cat AND fish" — which docs match?
How do you compress posting lists when they contain millions of document IDs?

Your Notes

Exercise 2🔴 Medium20 min

PageRank Intuition

The web has 4 pages: A links to B and C; B links to D; C links to D; D links to A. Design a simplified PageRank computation.

Tasks

Describe what PageRank measures in plain English.
Write one iteration of PageRank for all 4 pages (start with equal ranks 0.25).
Why does PageRank use a "damping factor" of 0.85?
How does link spam (buying links from low-quality sites) get detected and devalued?

Your Notes

Exercise 3🔴 Medium25 min

Web Crawler Design

Design a distributed web crawler that can crawl 1 billion URLs. The crawler must be polite (obey robots.txt), deduplicate (not re-crawl the same URL), and handle failures.

Tasks

Design the URL frontier — how do you prioritize which URLs to crawl first?
How do you deduplicate 1B URLs efficiently? (Hint: Bloom filter)
How do you enforce crawl politeness (max 1 request/second per domain)?
What's a "crawl trap" and how do you detect/avoid it?

Your Notes

Exercise 4🔥 Hard35 min

Sub-200ms Search Query Serving

Google serves 99,000 searches/second with p99 latency under 200ms. The inverted index is 100TB across 1,000 index servers. Design the query serving path.

Tasks

How is the 100TB index sharded across 1,000 servers? Describe two strategies.
A query fans out to how many index servers simultaneously? What pattern is this?
How does the serving system handle one slow index server (stragglers)?
Design the result merging and ranking step after collecting partial results.

Your Notes

Search Engine Design

Tasks

Tasks

Tasks

Tasks