Day 26 ยท Week 4

Design Stripe

Idempotency keys, double-entry ledger, exactly-once payment processing, webhook delivery guarantees, and PCI DSS compliance architecture.

$1T
Processed/Year
4h
Study Time
2
Simulations
5
Quizzes
Idempotency Distributed Ledger Webhook Delivery PCI DSS Retry Logic

Money Requires Correctness Over Speed

A payment processed twice destroys trust and may be impossible to fix without manual intervention. Every design decision at Stripe starts with: "what happens if this fails halfway?"

๐Ÿ”‘

Correctness over Speed

A payment processed twice destroys trust. Idempotency is non-negotiable โ€” every mutating request must carry a client-generated key that makes retries safe.

๐Ÿ“’

Double-Entry Ledger

Every payment creates 2+ ledger entries (debit + credit). Sum of all entries always equals zero. Balance reconciles trivially โ€” no money can silently disappear.

๐Ÿช

Exactly-Once Delivery

Webhooks may fail โ€” retry with idempotency prevents double-notifying merchants. At-least-once delivery + idempotent handlers = exactly-once semantics.

๐Ÿ›ก๏ธ

PCI DSS

Raw card numbers never touch Stripe's API servers โ€” tokenized at edge, stored in isolated vaults. Dramatically reduces compliance scope for the main system.

โš ๏ธ

The Double-Charge Nightmare

Network timeout โ†’ client retries โ†’ server processes both โ†’ customer bank debited twice. Without idempotency this is not a hypothetical โ€” it happens constantly. A retry after a 504 is indistinguishable from a new request without an idempotency key tying them together.

Payment Flow โ€” Step Through

Click "Next Step" to advance through a complete Stripe charge. Each step shows what happens on Stripe's infrastructure, including timing.

Charge Pipeline

POST /v1/charges โ€” from merchant request to webhook fired

1
Merchant โ†’ POST /v1/charges with Idempotency-Key
Client sends amount, currency, customer_id, and a unique Idempotency-Key header (e.g., a UUID generated client-side). Stripe's API gateway validates authentication, rate limits, and parses the request body. The idempotency key is the entire safety net for retries.
Latency: 0ms (request arrives)
2
Idempotency Key Check โ€” Redis Lookup
Stripe checks Redis for key idem:{idempotency_key}. If found: return the exact cached response (same charge_id, same amount, same status) โ€” customer is not charged again. If not found: acquire a Redis NX lock on the key to prevent concurrent duplicate processing from racing servers.
Latency: ~1-2ms (Redis round trip)
3
Card Tokenization โ€” PAN Never Leaves Vault
Customer's tok_xxx token is resolved in the PCI-isolated vault to retrieve the raw card number (PAN). The vault is a separate network segment โ€” the main Stripe API servers never see the raw PAN. Only the vault communicates with card networks. This limits PCI compliance scope dramatically.
Latency: ~5-15ms (vault RPC)
4
Card Network Authorization โ€” Visa/Mastercard/Issuing Bank
Stripe submits an authorization request to the card network (Visa or Mastercard), which routes to the customer's issuing bank. The bank checks: available credit, fraud signals, card status. Response: approved with auth code, or declined with reason code. This is synchronous โ€” the merchant waits for this.
Latency: 300โ€“2000ms (bank round trip, P99)
5
Ledger Write โ€” Double-Entry, Atomic Transaction
Two rows inserted in a single PostgreSQL transaction: DEBIT customer_account (reduces balance) and CREDIT merchant_account (increases receivable). If either insert fails, both roll back. Invariant: SUM of all ledger entries = 0 always. This is how Stripe knows no money has vanished.
Latency: ~5-10ms (Postgres serializable txn)
6
Webhook Fired โ€” Async with Exponential Backoff Retry
Stripe publishes a charge.succeeded event to Kafka. A webhook worker picks it up, signs the payload with HMAC-SHA256, and POSTs to the merchant's registered URL. If the merchant returns non-2xx, the worker retries with exponential backoff: 5s, 30s, 5min, 30min, 2h, 5h, 10h, 24h. Fails permanently after 72h.
Latency: async, typically <1s to first attempt
Step 1 of 6

The Idempotency Key Simulator

Enter an idempotency key and amount, then fire the charge multiple times. Watch how Stripe's cache prevents double-charging even under concurrent retries.

Idempotency Key Demo

Same key = same response. New key = new charge. Try sending the same key twice.

0
Total Requests
0
New Charges
0
Deduplicated
0
Double Charges Prevented
Send a charge to see idempotency in action...
idempotency.py
# Idempotency key implementation
async def create_charge(amount, currency, customer_id, idempotency_key):
    # Check if we've seen this key before
    cached = await redis.get(f"idem:{idempotency_key}")
    if cached:
        return json.loads(cached)  # Return exact same response as before

    # Acquire distributed lock for this key
    async with redis_lock(f"lock:{idempotency_key}", timeout=30):
        # Double-check after acquiring lock (race condition)
        cached = await redis.get(f"idem:{idempotency_key}")
        if cached:
            return json.loads(cached)

        # Process the charge
        result = await process_payment(amount, currency, customer_id)

        # Store result with 24h TTL
        await redis.setex(
            f"idem:{idempotency_key}",
            86400,  # 24 hours
            json.dumps(result)
        )
        return result

Every Dollar Has Two Entries

Stripe's ledger is modeled on 700-year-old double-entry bookkeeping. Every payment creates at least two ledger entries. The sum of all entries is always zero โ€” any deviation indicates a bug.

AccountDebitCreditNotes
Customer payment received$100.00Gross amount customer paid
Stripe processing fee$3.202.9% + $0.30 = $3.20
Merchant net receivable$96.80Net settled to merchant
Stripe revenue$3.20Stripe's fee income
Balance check$3.20$200.00 - $96.80 - $3.20 = 0Sum = $0 โœ“
โ„น๏ธ

Why Double-Entry?

Double-entry means every debit has a matching credit. SUM of all ledger entries = 0 always. This is how Stripe knows no money has disappeared โ€” a simple SELECT SUM(amount) FROM ledger = 0 is a powerful consistency check. Any non-zero result indicates a bug in the payment processing code.

Guaranteed Delivery with Backoff

Merchants depend on webhooks for order fulfillment. Stripe retries for 72 hours to survive merchant outages, deployments, and transient failures.

Webhook Retry Simulator

Simulate a merchant server that's down. Watch Stripe retry with exponential backoff.

Attempt 1
Immediate โ€” first delivery attempt
Waiting
+5s
5 seconds after failure
Waiting
+30s
30 seconds after failure
Waiting
+5min
5 minutes after failure
Waiting
+30min
30 minutes after failure
Waiting
+2h
2 hours after failure
Waiting
+5h
5 hours after failure
Waiting
+24h
Final attempt at 72h
Waiting
webhook_delivery.py
# Webhook delivery with exponential backoff
RETRY_DELAYS = [5, 30, 300, 1800, 7200, 18000, 36000, 86400]  # seconds

async def deliver_webhook(event_id: str, merchant_url: str, payload: dict):
    for attempt, delay in enumerate(RETRY_DELAYS):
        try:
            resp = await httpx.post(
                merchant_url,
                json=payload,
                headers={"Stripe-Signature": compute_signature(payload)},
                timeout=30
            )
            if 200 <= resp.status_code < 300:
                await mark_delivered(event_id, attempt)
                return
        except Exception as e:
            logger.warning(f"Webhook attempt {attempt} failed: {e}")

        if attempt < len(RETRY_DELAYS) - 1:
            await asyncio.sleep(delay)

    await mark_failed(event_id)  # Alert merchant via dashboard

Why Each Technology

Primary Database
PostgreSQL
Serializable isolation level prevents phantom reads in financial transactions. ACID guarantees mean partial failures are impossible โ€” either both ledger entries commit or neither does.
Not MongoDB โ€” document stores lack multi-document transactions needed for ledger consistency. Not MySQL โ€” serializable isolation is slower and less battle-tested.
Idempotency Cache
Redis
Microsecond key lookup + SET NX (atomic acquire) for distributed locking. 24h TTL automatically expires old idempotency keys. Memory-resident = no disk I/O on the hot path.
Not Postgres for idempotency โ€” row-level lock on a hot idempotency table creates a bottleneck. Redis SET NX is purpose-built for this pattern.
Webhook Queue
Kafka
Durable event log survives webhook worker crashes. Replay from offset to redeliver if workers fail. Topic partitioning allows parallel delivery. At-least-once delivery semantics match webhook requirements.
Not SQS โ€” can't replay consumed messages. Not HTTP direct from API server โ€” coupling creates failures if webhook worker is slow.
Secret Storage
HashiCorp Vault
Dynamic secrets rotate automatically. Audit log for all key accesses. Encryption-as-a-service โ€” apps never handle raw keys. Essential for PCI DSS compliance documentation.
Not env variables โ€” no audit trail, no rotation, easily leaked. Not AWS KMS alone โ€” Vault adds dynamic secrets and a richer access control layer.

Quiz โ€” 5 Questions

1. A merchant sends POST /charges twice with the same idempotency-key due to a timeout. Stripe should:
2. Double-entry accounting in Stripe's ledger means:
3. Stripe's webhook retry schedule (exponential backoff) serves to:
4. PCI DSS compliance requires that raw card numbers (PANs):
5. A payment is 'idempotent' if: