Master the fundamental consistency-availability trade-offs that every distributed system architect must internalize — plus the PACELC extension for latency vs consistency during normal operation.
Eric Brewer's CAP theorem states that a distributed system can provide at most two of three guarantees during a network partition. Understanding which two to choose is the key design decision.
Consistency (all nodes agree), Availability (every request responds), Partition Tolerance (survive network splits). During a real partition — you must choose C or A. P is non-negotiable for distributed systems.
Consistent during partition but may reject requests. Examples: Zookeeper, HBase, MongoDB (majority write), Redis Cluster (strict mode). Correctness trumps uptime.
Available during partition but may return stale data. Examples: Cassandra, DynamoDB (defaults), CouchDB, DNS. Eventually consistent — nodes reconcile after partition heals.
Extends CAP: even without partition (normal operation), systems trade Latency vs Consistency. A low-latency read may skip replication wait; a consistent read may wait for quorum acknowledgement.
Click on each region of the CAP triangle to explore which real-world systems fall into each category, and why you can only ever guarantee two of the three properties simultaneously.
The triangle shows the three properties of distributed systems. During a network partition, you must sacrifice either Consistency or Availability — Partition Tolerance is mandatory for any real distributed system.
CA systems (no partition tolerance) are single-node databases like PostgreSQL on one server. They're not truly distributed. Any system spanning multiple machines will eventually experience network partitions — hardware failures, datacenter splits, and cable cuts happen constantly at scale.
Inject a network partition between two datacenters and observe how CP vs AP systems respond to a write request during the split.
PACELC extends CAP by acknowledging that even when no partition occurs (the common case), systems must still trade off between latency and consistency. Most of your design decisions happen in the "else" (E) branch.
Partition → Availability vs Consistency • Else → Latency vs Consistency. Every system is classified by both halves: e.g. PA/EL means "AP during partition, low-latency reads in normal operation."
| System | Partition: A or C? | Else: L or C? | Use Case |
|---|---|---|---|
| DynamoDB (eventual) | PA — stays available | EL — low latency reads | E-commerce catalog |
| DynamoDB (strong) | PC — may reject writes | EC — waits for quorum | Financial records |
| Cassandra (QUORUM) | PA — stays available | EC — quorum read/write | Balanced workloads |
| Zookeeper | PC — rejects during partition | EC — fsync before ACK | Distributed locks, config |
| MySQL single node | CA — no partition possible | EC — ACID transactions | OLTP (no partition risk) |
Cassandra lets you dial consistency at the query level using the R+W>N formula. This makes it one of the most flexible systems — you choose your trade-off per operation.
# Cassandra tunable consistency from cassandra import ConsistencyLevel # Strong consistency: W=QUORUM + R=QUORUM with N=3 # W(2) + R(2) > N(3) → guaranteed overlap session.execute( "UPDATE accounts SET balance=$1 WHERE id=$2", [new_balance, account_id], consistency_level=ConsistencyLevel.QUORUM ) # Read with QUORUM result = session.execute( "SELECT balance FROM accounts WHERE id=$1", [account_id], consistency_level=ConsistencyLevel.QUORUM ) # Eventual (AP) — fast but may return stale result = session.execute( "SELECT product_views FROM analytics WHERE id=$1", [product_id], consistency_level=ConsistencyLevel.ONE )
Mapping business requirements to CAP/PACELC choices. The wrong choice here causes either data corruption or unnecessary downtime.
Use Zookeeper, Google Spanner, or CockroachDB. Correctness over availability — a double-spend is catastrophic. Accept rare write rejections during partitions.
Use Cassandra with ONE consistency. Eventual consistency is fine — seeing a count of 10,423 vs 10,424 is not a user-visible bug. Prioritize low latency and high write throughput.
Use DynamoDB with eventual reads. Amazon's Dynamo paper pioneered merging conflicting cart versions (union of items). Availability matters more than perfect consistency here.
Mixed strategy: strong reads (QUORUM) when actually decrementing stock during purchase; eventual reads for display count. Cassandra handles both with per-query consistency levels.
Test your understanding of CAP theorem and PACELC. Select the best answer for each question.