Day 10 Exercises — Database Sharding & Partitioning

Exercise 1 🟢 Easy ⏱ 15 min

✓ Completed

Shard Key Selection

A social platform shards its users table by user_id % 4 (4 shards). 80% of users are from the US — their user IDs are numerically lower (earlier registrations). Shards 0 and 1 handle 80% of the load; shards 2 and 3 handle 20%. The overloaded shards become a bottleneck causing p99 latency spikes, while two shards sit nearly idle.

Shard Imbalance (user_id % 4)

Shard 0: US users (40% load) 🔥

Shard 1: US users (40% load) 🔥

Shard 2: EU users (15% load)

Shard 3: APAC users (5% load)

Concept Check — 3 questions

Q1. The problem with user_id % N sharding when user IDs are assigned sequentially is:

AIndividual shards grow too large — the modulo operation doesn't limit shard size

BHot spots — sequential IDs paired with non-uniform access patterns create uneven distribution across shards

CCross-shard joins become completely impossible

DReplication breaks down when the shard count is odd

Q2. Consistent hashing solves uneven distribution by:

AGenerating random IDs for all users to ensure uniform modulo distribution

BAlways using exactly 256 shards to guarantee even distribution

CMapping keys to a hash ring where each shard covers a range — adding or removing a shard only remaps approximately 1/N of all keys

DRouting all traffic through a geographic load balancer

Q3. A good shard key should have which property above all others?

AAlphabetical ordering so shards can be accessed in sorted order

BHigh cardinality — enough unique values to distribute data and load evenly across all shards

CSequential integer values to make range queries easy within a shard

DA foreign key relationship to every other table to minimize cross-shard joins

The hot spot problem: modulo is deterministic — same ID always goes to same shard. If early IDs are more active (older users more engaged), the low-numbered shards are always hot. Consistent hashing maps keys to a ring using a hash function — distribution is based on hash output (uniform), not the raw key value. Adding shard: only its neighbors' data moves. Removing shard: only that shard's data redistributes. Average remapping = 1/N of total data.

Open Design Challenge

1

Design the shard key for a messaging system where users send messages to each other. If you shard by sender_id, what query patterns become cross-shard? If you shard by conversation_id, what changes?

2

A celebrity user (10M followers) on a social platform creates a hot shard. Design a "celebrity routing" exception: how do you handle accounts whose activity level exceeds single-shard capacity?

3

Design the shard routing layer: given a user_id, how does the application determine which shard DB to connect to? Sketch the lookup table vs hash ring approaches.

Concept score: 0/3

Exercise 2 🟡 Medium ⏱ 20 min

✓ Completed

Cross-Shard Queries

Orders are sharded by user_id. A business analytics query needs: SELECT sum(revenue) FROM orders WHERE date = '2024-01-01'. Since date is not the shard key, this query must fan out to all shards to get partial results, then aggregate them. At 32 shards, this is 32 parallel database calls for every analytics query.

Scatter-Gather Query Pattern

Analytics query (no shard key in WHERE)

→ fan out →

Shard 0: partial sum

Shard 1: partial sum

... Shard 31

→ merge →

Aggregator: total sum

Concept Check — 3 questions

Q1. A scatter-gather query fans out to all 32 shards in parallel. If each shard takes 100ms, what is the total query latency?

A~100ms — queries run in parallel; latency equals the slowest shard, not the sum

B~3,200ms — all shard queries execute sequentially

C~32ms — parallelism reduces latency by a factor of N

D~1,600ms — half the shards run in parallel, then the rest

Q2. To avoid scatter-gather queries for business analytics on sharded OLTP data, the recommended pattern is:

ACreate a secondary index on the date column across all shards

BIncrease the shard count to 256 to reduce per-shard data size

CMaintain a separate OLAP store (data warehouse like Redshift/BigQuery) fed by ETL from all shards; run analytics there

DNormalize the schema to eliminate the need for cross-shard aggregation

Q3. Cross-shard transactions (distributed transactions across multiple shard databases) should generally be avoided because:

AThey consume excessive disk space on each participating shard

BThey require 2-phase commit (2PC) which is slow, operationally complex, and can block indefinitely if the coordinator fails mid-transaction

CThey increase memory usage beyond what typical databases support

DDistributed transactions violate SQL standards and are unsupported

Scatter-gather latency = max(shard latencies), not sum — parallel execution is the key. However, each shard query does consume a DB connection and some CPU, so 32 parallel queries 32× more resource-intensive than 1. OLAP separation: OLTP shards are optimized for point reads/writes; analytics queries need full scans which are destructive to OLTP latency. Run ETL every hour into a data warehouse where full-table scans are expected and optimized. 2PC: Phase 1 = prepare (all shards lock resources). Phase 2 = commit. If coordinator dies after Phase 1, shards hold locks forever — blocking all conflicting transactions.

Open Design Challenge

1

Design the ETL pipeline from 32 OLTP shards to Redshift. What is the acceptable lag (near-real-time vs daily batch)? How do you handle schema changes in the OLTP layer?

2

An order transfer moves money from user A (shard 0) to user B (shard 5). Design this without 2PC: use the SAGA pattern with compensating transactions. Sketch the steps and failure scenarios.

3

Design a secondary index for querying orders by date across all shards without scatter-gather. Hint: maintain a separate "date index" table on a non-sharded global DB that maps (date, order_id, shard_id).

Concept score: 0/3

Exercise 3 🟡 Medium ⏱ 25 min

✓ Completed

Resharding Without Downtime

A startup began with 4 shards. After rapid growth, each shard holds 500GB of data at 80% CPU utilization. They need to double to 8 shards to restore headroom. The challenge: during migration, a write could land on the old shard location (user_id % 4) while the data is being moved to the new location (user_id % 8), causing data divergence.

Resharding: 4 → 8 Shards

4-shard ring (80% CPU)

→ split each →

8-shard ring (migration in progress)

→ traffic switch →

8-shard ring (40% CPU)

Concept Check — 3 questions

Q1. To reshard from 4 to 8 shards without taking downtime, the safest approach is:

AStop all writes, migrate all data to 8 shards, update routing config, restart writes

BDouble-write to both old and new shard locations during migration, migrate historical data in background, flip reads to new shards, then stop writes to old shards

CExport all 4 shards to CSV files and reimport into 8 shards at scheduled maintenance

DIncrease individual shard storage limits instead of adding more shards

Q2. Virtual nodes (vnodes) in consistent hashing make resharding easier by:

AAutomating schema migration scripts across all physical shards

BCompressing shard data to make it faster to transfer between nodes

CAssigning many small virtual shards per physical node — when adding a node, only the relevant vnodes move; remapping is granular and gradual

DEnabling atomic cross-shard transactions during the migration window

Q3. During resharding, a read arrives for a key that has already been migrated to the new shard but hasn't been deleted from the old shard yet. The read-migration pattern handles this by:

AReturning HTTP 404 during the migration window — reads are disabled

BFirst checking the new shard; if key not found, fall back to the old shard — then lazily migrate the data and remove from old shard

CLocking all reads globally during migration to prevent inconsistency

DServing only from the cache layer during migration to hide inconsistency

Double-write migration: Phase 1 — start writing to both old and new shard for new writes. Phase 2 — background job migrates historical data (batched, rate-limited to not overload). Phase 3 — flip reads to new shard. Phase 4 — stop writing to old shard. Phase 5 — decommission old shard. Each phase can be rolled back independently. Vnodes (Cassandra uses 256 vnodes per node by default): adding 1 physical node moves 1/N of vnodes from each existing node — gradual, controlled redistribution.

Open Design Challenge

1

Design the rate limiting for the background data migration job. The migration is copying 500GB × 4 shards = 2TB of data. At what MB/sec should it run to complete in 24 hours without saturating production disk I/O?

2

Design a migration progress dashboard: what metrics would you show to operators monitoring the resharding? Include per-shard migration %, write lag between old/new, and estimated completion time.

3

During double-write, a key is written to the old shard but the write to the new shard fails. Design the reconciliation mechanism to detect and fix this divergence before the final cutover.

Concept score: 0/3

Exercise 4 🔴 Hard ⏱ 30 min

✓ Completed

Global ID Generation at Scale

A distributed system sharded across 32 nodes needs globally unique IDs for every record. Options: UUID v4 (128-bit random), DB auto-increment per shard (not globally unique), Snowflake ID (64-bit timestamp+machine+sequence), ULID (128-bit sortable random). Each trades off size, sortability, uniqueness guarantees, and database performance.

ID Generation Comparison

UUID v4: 128-bit random, not sortable

vs

Snowflake: 41-bit ms-timestamp + 10-bit machine + 12-bit seq = 64-bit sortable

vs

ULID: 48-bit time + 80-bit random, sortable base32

Concept Check — 3 questions

Q1. Why are random UUID v4 IDs problematic for B-tree database indexes?

A128-bit is too large — databases have a maximum index key size

BRandom insertion order causes B-tree page splits and cache fragmentation — sequential IDs produce much better index locality and fewer page splits

CUUID v4 values are not globally unique — collisions occur at scale

DUUID generation is too CPU-intensive for high-throughput inserts

Q2. A Snowflake ID embeds a millisecond timestamp in the top 41 bits. What does this mean for the generated IDs?

AIDs expire and become invalid after a certain number of years

BIDs are only valid for a maximum of 10 years before rollover

CIDs are time-sortable, B-tree friendly, and allow approximate creation time to be decoded — but correctness depends on synchronized clocks and a unique machine ID registry

DIDs must be assigned by a central server to ensure the timestamp is consistent

Q3. A Snowflake ID generator process restarts and the sequence counter resets to 0. What is the risk?

ANo risk — Snowflake IDs are globally unique by design regardless of restarts

BIf the process restarts within the same millisecond with the same machine ID, the sequence counter starts at 0 again — potentially generating a duplicate ID that was already assigned before the restart

CAll previously generated IDs become invalid and the system must regenerate them

DID generation pauses for exactly 1 second after restart to skip the risky millisecond

UUID v4 B-tree problem: every insert must go to a random position in the tree → frequent page splits, poor cache utilization, 50–100% write amplification vs sequential IDs. Snowflake IDs are monotonically increasing within a machine → sequential B-tree inserts, near-perfect cache hit rate for recent data. Restart fix: at startup, sleep until the next millisecond before generating any IDs (ensures sequence=0 at a new timestamp). Alternatively, persist last_used_ms and refuse to generate IDs with a timestamp ≤ last_used_ms.

Open Design Challenge

1

Design the machine ID registry for a 32-node Snowflake cluster. How do nodes get assigned their 10-bit machine IDs? What happens when a node is decommissioned and a new node comes up — can it reuse the old machine ID?

2

Snowflake IDs embed creation time. This means you can do time-range queries (WHERE id BETWEEN snowflake_at('2024-01-01') AND snowflake_at('2024-12-31')) without a separate created_at column. Sketch the implementation of a snowflake_at() function.

3

Compare Snowflake vs ULID for a multi-tenant SaaS: which would you choose if (a) you need to sort records by creation time and (b) you don't want to expose a centralized machine ID registry? Justify your answer.

Concept score: 0/3

Day 10 Complete 🎉