Day 11 — Week 2

NoSQL Data Modeling

NoSQL databases require designing schemas around access patterns, not around data relationships. Learn single-table design, partition key selection, and how to detect hot partitions before they kill your system.

Partition Key Design Single-Table Design GSI / LSI Hot Partition Detection Access Pattern First
Key Concepts
🗝️
Partition Key (Hash Key)
The primary distribution key. All items with the same partition key live on the same physical partition. A bad partition key (e.g., status="active") puts all writes on one shard and creates a hotspot.
📐
Single-Table Design
Store all entity types in one DynamoDB table, using composite sort keys to differentiate them. Avoids expensive joins. The PK/SK pattern encodes entity type: PK=USER#123, SK=PROFILE or SK=ORDER#456.
📇
Global Secondary Index (GSI)
An alternate partition key that projects a subset of attributes. Enables access patterns impossible on the base table. In DynamoDB, you specify the GSI partition key at write time; reads go to the index.
🔥
Adaptive Capacity
DynamoDB automatically shifts capacity to hot partitions up to 10× the provisioned baseline. But it's not infinite — design your partition key so no single partition gets more than ~3K RCU/sec in steady state.
Interactive Simulation — Partition Key Analyzer

Type a partition key pattern and see how traffic distributes across 10 partitions. Watch for hot partition warnings.

⚠️ HOT PARTITION DETECTED — This key pattern concentrates traffic. Consider adding a random suffix or using a composite key.
✅ Well distributed — Good partition key! Traffic is spread evenly across all partitions.
Access PatternWith this PKNotes
Get user by ID✓ EfficientDirect partition lookup
List user's orders✓ EfficientQuery SK prefix within PK
Find all orders today✗ ExpensiveRequires scan or GSI
Get orders by status✗ ExpensiveNeed GSI on status attribute
Architecture — Single-Table Design Pattern
App Request
DynamoDB SDK
Hash Router
PK → partition
Partition Node
B-tree on SK
GSI Partition
alternate PK
EntityPKSKGSI1-PK
UserUSER#userIdPROFILEemail@domain.com
OrderUSER#userIdORDER#orderIdSTATUS#pending
ProductPRODUCT#productIdDETAILSCATEGORY#electronics
Technology Decision
PatternWhen to UseTrade-off
Single-table designKnown, stable access patterns; cost-sensitive DynamoDB usageComplex schema, hard to query ad-hoc
Multi-table designEvolving schema, team flexibility more important than costJoins become application-level scatter-gather
GSINeed alternate access pattern on existing dataExtra write cost and storage (projected attributes)
DynamoDB StreamsReact to changes, ETL to analytics DB, maintain read modelsLambda polling cost, at-least-once delivery
Code Example — DynamoDB Single-Table Design
import boto3
from boto3.dynamodb.conditions import Key

dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('AppTable')

# Write a user
table.put_item(Item={
    'PK': 'USER#user_123',
    'SK': 'PROFILE',
    'email': 'alice@example.com',
    'name': 'Alice',
    'GSI1PK': 'alice@example.com',  # GSI lookup by email
})

# Write an order under the same user
table.put_item(Item={
    'PK': 'USER#user_123',
    'SK': 'ORDER#ord_2024_001',
    'total': 49.99,
    'status': 'shipped',
    'GSI1PK': 'STATUS#shipped',  # GSI: all shipped orders
})

# Query: get all orders for user_123
response = table.query(
    KeyConditionExpression=Key('PK').eq('USER#user_123') &
                           Key('SK').begins_with('ORDER#')
)

# Query: all shipped orders via GSI
shipped = table.query(
    IndexName='GSI1',
    KeyConditionExpression=Key('GSI1PK').eq('STATUS#shipped')
)
Quiz
1. Why is "timestamp" a poor partition key for a write-heavy table?
2. What does a Global Secondary Index (GSI) enable in DynamoDB?
3. In single-table design, a PK of "USER#123" and SK of "ORDER#456" represents:
4. Cassandra vs DynamoDB: which statement is accurate?
5. When designing a NoSQL schema, what should you define FIRST?