When a single database can't handle your scale, sharding splits data across multiple nodes. Learn consistent hashing, virtual nodes, and how to minimize data movement when adding capacity.
Watch how adding/removing shards affects key distribution. Notice how consistent hashing minimizes remapping.
| Strategy | Distribution | Resharding | Cross-shard queries |
|---|---|---|---|
| Hash partitioning | Even (ideal key) | Expensive without consistent hash | Scatter-gather all shards |
| Range partitioning | Uneven if data skewed | Easy (split a range) | Only relevant shards (range scan) |
| Directory partitioning | Fully custom | Manual but flexible | Depends on directory lookup |
| Choose | When |
|---|---|
| Consistent hashing | Need elastic scaling with minimal data movement (Cassandra, DynamoDB) |
| Range partitioning | Range scan queries are common (time-series, sorted data, HBase, BigTable) |
| Application-level routing | Simple setups, you control the shard key explicitly in every query |
| Vitess / Citus | MySQL/Postgres sharding with transparent routing layer on top |
import hashlib
from sortedcontainers import SortedDict
class ConsistentHashRing:
def __init__(self, vnodes: int = 150):
self.vnodes = vnodes
self.ring: SortedDict = SortedDict()
self.nodes: set = set()
def _hash(self, key: str) -> int:
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def add_node(self, node: str):
for i in range(self.vnodes):
vnode_key = f"{node}:vn{i}"
h = self._hash(vnode_key)
self.ring[h] = node
self.nodes.add(node)
def remove_node(self, node: str):
for i in range(self.vnodes):
vnode_key = f"{node}:vn{i}"
h = self._hash(vnode_key)
self.ring.pop(h, None)
self.nodes.discard(node)
def get_node(self, key: str) -> str:
if not self.ring:
raise ValueError("No nodes in ring")
h = self._hash(key)
# Find first node clockwise from hash position
idx = self.ring.bisect_left(h)
if idx == len(self.ring):
idx = 0 # wrap around the ring
return self.ring.values()[idx]
# Usage
ring = ConsistentHashRing(vnodes=150)
for n in ["shard-1", "shard-2", "shard-3", "shard-4"]:
ring.add_node(n)
# Route a key to its shard
node = ring.get_node("user:12345")
print(f"user:12345 → {node}")
# Add a node — only ~1/N keys remap
ring.add_node("shard-5")