Day 15

CAP Theorem & PACELC

Move beyond the CAP triangle — understand why partition tolerance is mandatory, when to choose consistency vs availability, and how PACELC captures the latency-consistency trade-off in normal operation.

Exercise 1🟢 Easy15 min
Classifying 5 Databases as CP or AP
You are preparing a system design interview presentation on CAP theorem and need to correctly classify five widely-used databases: ZooKeeper, Apache Cassandra, Amazon DynamoDB (default configuration), PostgreSQL with streaming replication, and MongoDB (with majority write concern). Each database makes different trade-offs during a network partition. Understanding these classifications helps you choose the right database for each part of your system.

Tasks

  • For each of the 5 databases, state CP or AP and give a one-sentence explanation of what happens during a network partition (e.g., "refuses writes on minority partition" vs "serves stale reads").
  • Explain why the "CA" quadrant of CAP is essentially impossible in a distributed system — what does CAP assume about network partitions that makes CA a theoretical non-option?
  • Apply the PACELC model to Cassandra: in the Else (no partition) case, Cassandra trades Latency for Consistency — explain the tunable consistency levels (ONE, QUORUM, ALL) and where each sits on the latency-consistency spectrum.
  • Give one real-world example where choosing an AP database (like Cassandra) over a CP database (like ZooKeeper) for a specific use case was the right call — and one example where it would be dangerous.
Your Notes
Exercise 2🔴 Medium30 min
Healthcare Patient Records — CP or AP?
A hospital system is deploying an Electronic Health Record (EHR) platform across 3 geographically distributed data centers (East, Central, West). Physicians at any location must be able to read and update patient records. The platform stores medication prescriptions, allergy lists, and lab results. During the COVID-19 surge, the hospital network experienced two 4-hour partial outages where inter-DC connectivity dropped to 60% packet loss. The architecture team is debating whether CP or AP is more appropriate for this domain.

Tasks

  • Make the CP argument: what is the patient safety risk if a physician at West DC reads a stale allergy list during a partition and prescribes a medication the patient is allergic to? How does CP prevent this?
  • Make the AP argument: what is the patient safety risk if a physician at West DC cannot write a critical medication update because the East DC is unreachable and the system refuses the write? When is AP actually safer?
  • Propose a nuanced solution: which specific data types (allergies, lab results, prescriptions, appointment schedules) require CP semantics, and which can tolerate AP semantics with conflict resolution?
  • Design the conflict resolution strategy for AP data: if two physicians update the same appointment record concurrently during a partition, how is the conflict detected and resolved when connectivity is restored?
Your Notes
Exercise 3🔴 Medium30 min
Shopping Cart — Arguing for AP and Designing Conflict Resolution
Amazon's Dynamo paper (2007) famously used the shopping cart as the canonical example for choosing AP over CP. An e-commerce platform's shopping cart service handles 2 million carts concurrently. During Black Friday, network partitions between US-East and US-West occur 3 times (each lasting 8–15 minutes). The cart must remain writable in both regions during these partitions — a read-only cart during a sale is a direct revenue loss estimated at $4M per partition event.

Tasks

  • Argue for AP: explain why an incorrect cart (showing an item that was deleted by another session) is a far less severe outcome than an unavailable cart — frame this as a business risk vs technical correctness trade-off.
  • Describe the divergence scenario: a user adds "Blue Jacket" to their cart on mobile (US-West), while a background session removes it on desktop (US-East), during a 10-minute partition. What does each replica hold when the partition heals?
  • Design the conflict resolution algorithm using "add-wins" CRDT (conflict-free replicated data type) semantics — describe how a set with tombstones resolves the add/remove conflict without data loss and why "add-wins" is the correct business choice for a cart.
  • Describe one scenario where "add-wins" produces a wrong business outcome (hint: think about items the user explicitly removed and the emotional/UX impact) and propose a mitigation.
Your Notes
Exercise 4🔥 Hard50 min
Multi-Region CP Configuration Store with Sub-100ms Reads
A global SaaS platform stores feature flags and service configuration in a distributed configuration store. Configs are read by 50,000 microservice instances across 5 AWS regions (us-east-1, us-west-2, eu-west-1, ap-southeast-1, ap-northeast-1). The system must be CP — a misconfigured feature flag that enables a bug for 0.1% of users is preferable to different services in the same region seeing different config values. However, reads must complete in under 100ms from any region. Writes happen infrequently: ~20 config changes/day.

Tasks

  • Explain why pure CP (e.g., a single-leader database in us-east-1) fails the 100ms read requirement for ap-southeast-1 — calculate the round-trip latency from Singapore to US-East (assume 200ms RTT) and why strong reads worsen it.
  • Design a "CP with local read caches" architecture: a Raft-based leader (us-east-1) handles all writes, but each region has a local read replica with a bounded staleness guarantee of 5 seconds. How do you ensure cache invalidation is atomic relative to config writes?
  • Describe how to handle the partition scenario where ap-southeast-1 is isolated: should the local replica serve stale config (AP behavior) or refuse reads (CP behavior)? Justify your choice given the use case of feature flags vs critical security configs.
  • Design a config versioning and rollout system that allows "staged rollout" of a new config to 1 region before global propagation, while maintaining CP guarantees — each service instance must see a consistent view of the config for its region.
Your Notes