NoSQL Data Modeling

Partition key design, hot partition avoidance, denormalization, and access pattern-driven modeling.

4 Exercises
12 Concept Checks
~90 min total
System Design
Session Progress
0 / 4 completed
Exercise 1 🟡 Easy ⏱ 15 min
✓ Completed
Partition Key Design in DynamoDB
An IoT platform stores sensor readings. Each sensor sends 1 event/second. The DynamoDB table uses sensor_id as the partition key. Sensor #42 monitors a critical server and generates 10× more events than all others — 10 writes/sec on a single partition while others get 1 write/sec each.
Hot Partition Problem
Sensor #1 → 1 write/sec
Sensor #2 → 1 write/sec
Sensor #42 → 10 writes/sec (HOT)
DynamoDB
PK = sensor_id
Partition A (sensor #1) — normal
Partition B (sensor #2) — normal
Partition C (sensor #42) ⚡ THROTTLED
Concept Check — 3 questions
Q1. A hot partition in DynamoDB causes?
AAll partitions to slow down uniformly across the table
BThrottling on the hot partition — writes and reads to that partition are rejected when throughput limit is exceeded
CPermanent data loss on the affected partition
DReplication failure across all availability zones
Q2. To prevent hot partitions for high-write sensors, you can?
AUse sensor_id as the sort key instead of the partition key
BIncrease the DynamoDB table's provisioned throughput capacity
CShard the partition key with a random suffix: sensor_id#1, sensor_id#2, etc. — spread writes, then aggregate at read time
DSwitch to synchronous writes to slow down the producer
Q3. DynamoDB's partition key + sort key composite key allows?
ACross-partition transactions for multi-item atomic consistency
BMultiple items per partition key, sorted by sort key — enables range queries within a single partition
CFull table scans without consuming read capacity units
DMulti-table joins exactly like relational SQL
Hot partitions occur when one partition key receives disproportionate traffic. DynamoDB allocates throughput per partition — exceed the limit and requests get throttled (ProvisionedThroughputExceededException). Fix: append a random suffix 0-9 to sensor_id → 10 partitions share the write load → at read time, query all 10 suffixes and aggregate. Composite keys enable range queries: sensor_id#5 + created_at lets you fetch all events for a sensor in a time range efficiently.
Open Design Challenge
1
Design the DynamoDB table schema for IoT sensor readings. Include partition key, sort key, and at least 3 attribute fields. Justify your key choices.
2
With 10 partition shards per sensor (sensor_id#0 through sensor_id#9), how do you query "get last 100 readings for sensor #42"?
3
Design a GSI (Global Secondary Index) to enable the query: "find all sensors that reported temperature > 90°C in the last hour."
Concept score: 0/3
Exercise 2 🟡 Easy ⏱ 20 min
✓ Completed
Denormalization for Read Performance
A social network uses MongoDB. Normalized schema: posts collection (author_id reference) and a users collection. Fetching a feed requires: query posts → for each post, query users for author name and photo. This N+1 pattern hits the DB once per post. Denormalized: embed author name and photo directly in each post document — no join needed.
Normalized vs Denormalized Schema
❌ NORMALIZED (N+1 problem)
Post{author_id: "u42"}
↓ lookup per post
User{name, photo} — extra DB query
✅ DENORMALIZED (single read)
Post{author_id, author_name, author_photo}
↓ one document fetch
All data embedded — no lookup needed
Concept Check — 3 questions
Q1. Denormalization in MongoDB embeds author data in the post. The main risk is?
APost documents become too large to store in MongoDB (exceeds 16MB limit)
BStale embedded data — if the user changes their name or photo, all embedded posts have outdated information
CWrites become significantly slower than the normalized schema
DYou can no longer query posts by author_id field
Q2. The "write duplication" cost of denormalization is acceptable when?
AAlways — denormalization is always the right choice in NoSQL databases
BNever — duplicate data always causes more problems than it solves
CWhen read frequency greatly exceeds write frequency — denormalization trades write overhead for dramatic read speed gains
DOnly for small datasets that fit entirely in memory
Q3. For a user's post feed, which MongoDB schema is better?
AAlways keep posts and users in completely separate collections for cleanliness
BEmbed frequently-read fields (author name, photo) in the post document — avoids lookups for the common read path
CSwitch to a relational SQL database to enable proper joins
DStore posts as XML strings for maximum query flexibility
The golden rule of NoSQL modeling: model for how you query, not how you store. Feeds are read 100× more than users change their profiles. Embedding author_name and author_photo is worth the staleness risk. To handle updates: when a user changes their display name, run an async background job to update embedded copies — eventual consistency is fine for display names. MongoDB's 16MB document limit is almost never a concern for this pattern.
Open Design Challenge
1
Design the full MongoDB document schema for a social media post. Include embedded author info, media attachments, and a like count. What fields would you embed vs reference?
2
A user changes their profile photo. You have 50 million posts with their embedded photo URL. How do you update all of them without downtime?
3
Design the feed query: "get 20 most recent posts from users I follow, sorted by created_at." What indexes are needed?
Concept score: 0/3
Exercise 3 🔴 Medium ⏱ 25 min
✓ Completed
Access Pattern-Driven Modeling
An e-commerce platform uses Cassandra. Three distinct access patterns: 1) Get all orders for a user, 2) Get a specific order by ID, 3) Get orders within a date range. Cassandra requires knowing access patterns upfront — it cannot efficiently execute ad-hoc queries on non-key columns. Each pattern needs its own dedicated table.
One Table Per Access Pattern (Cassandra)
orders_by_user (PK: user_id, SK: created_at)
orders_by_id (PK: order_id)
orders_by_date (PK: date, SK: created_at)
Cassandra
data duplicated per table
Fast single-partition
reads for each pattern
Concept Check — 3 questions
Q1. Creating 3 separate tables for 3 access patterns (the Cassandra way) is called?
ANormalization — systematically reducing data redundancy across tables
BDenormalized query tables — one table per access pattern, data duplicated to serve each query shape efficiently
CVertical partitioning — splitting columns into separate tables
DTable inheritance — sharing data through a parent-child table hierarchy
Q2. In Cassandra, running SELECT * FROM orders WHERE date > '2024-01-01' requires?
ACreating a secondary index on the date column
BUsing the ALLOW FILTERING directive — expensive full cluster scan
CA partition key design that includes date — Cassandra cannot efficiently filter on non-partition-key columns without scanning all partitions
DEnabling a SQL full-text search extension within Cassandra
Q3. A wide row (many columns per partition) in Cassandra is used for?
ATime-series data — each column is a measurement at a specific timestamp, making sequential time-range reads extremely efficient
BUser profile data where each attribute is a separate column
CTransactional order data requiring full ACID guarantees
DAnalytics aggregation tables meant to replace a data warehouse
Cassandra's mantra: model for queries, not for relations. Before writing any schema, list every query the application will run. Then create one table per query shape. Storage is cheap; slow reads are not. Wide rows shine for time-series: partition key = sensor_id, clustering key = timestamp → one sequential disk read returns hundreds of time-ordered measurements. Never use ALLOW FILTERING in production — it scans every node in the cluster.
Open Design Challenge
1
Write the CQL CREATE TABLE statements for all three order tables. Include complete PRIMARY KEY definitions with partition and clustering keys.
2
When an order status changes from PENDING to SHIPPED, you must update all 3 tables consistently. Design the write path — how do you keep them in sync?
3
A new requirement: "get orders totaling over $500 for a user." Cassandra can't filter on non-key columns efficiently. What are your options?
Concept score: 0/3
Exercise 4 🔥 Hard ⏱ 30 min
✓ Completed
TTL and Time-Series Patterns
An analytics platform stores clickstream events. Storage grows unbounded — 10 billion new events/month. Events older than 90 days are never queried. Running DELETE jobs against 900 billion rows is prohibitively expensive. The solution: Cassandra's TTL (Time To Live) built into inserts so rows expire automatically with no manual cleanup.
Cassandra TTL Lifecycle
Event written
INSERT ... USING TTL 7776000
90 days pass
TTL counter reaches 0
Tombstone written
row hidden from reads
Compaction runs
tombstone permanently removed
Concept Check — 3 questions
Q1. Cassandra TTL (Time To Live) automatically expires rows by?
AImmediately deleting the row from disk the moment the TTL value reaches zero
BWriting a tombstone marker at expiry time — the row is hidden from reads but physically removed only during compaction
CAutomatically archiving the row to a cold storage tier
DOverwriting the row's values with nulls to indicate expiry
Q2. Time-series data in Cassandra uses (date, user_id) as the partition key. The date component prevents?
AHot partitions from high-traffic users generating too many writes
BCross-datacenter replication lag from heavy write volume
CUnbounded partition growth — without date, one user's partition accumulates events forever; with date, each day creates a new bounded partition
DRead amplification caused by compaction overhead
Q3. For 10B events/month with 90-day retention, total storage at 200 bytes/event is approximately?
A6 TB total storage
B~60 TB total — 10B events × 3 months × 200 bytes/event, multiplied by replication factor and overhead
C600 TB total storage
D6 PB total storage
TTL in Cassandra: INSERT INTO events (...) VALUES (...) USING TTL 7776000 (90 days in seconds = 90 × 86400). Tombstones are lightweight — data appears deleted to readers immediately but disk space is freed at compaction. Watch out: tombstone accumulation slows reads if compaction is too infrequent. For time-series storage math: 10B events/month × 3 months = 30B events × 200 bytes = 6TB raw. With Cassandra replication factor 3 plus compaction overhead, cluster storage ≈ 60TB total.
Open Design Challenge
1
Write the Cassandra CREATE TABLE for clickstream events with a default TTL and a (date, user_id) partition key. What clustering columns would you include?
2
With TTL, tombstones accumulate until compaction runs. Which Cassandra compaction strategy (STCS, LCS, TWCS) is best for time-series data and why?
3
Design the data lifecycle: raw events in Cassandra (90-day TTL) → aggregated daily metrics in a data warehouse. What triggers the aggregation job?
Concept score: 0/3