Day 13 Exercises — Stream Processing & Event Sourcing

Exercise 1 🟡 Easy ⏱ 15 min

✓ Completed

Event Sourcing Fundamentals

A banking app stores current account balance ($1,247.53). Traditional approach: UPDATE accounts SET balance=$1,247.53. Event-sourced approach: store the history — Deposited($500), Withdrew($200), Deposited($947.53). Current balance = replay all events. The balance is derived, not stored directly — history is the source of truth.

Event Log as Source of Truth

AccountOpened{$0}

Deposited{$500}

Withdrew{$200}

Deposited{$947.53}

→ replay →

Current state
balance = $1,247.53

→

Any past state
replayable anytime

Concept Check — 3 questions

Q1. The main advantage of event sourcing over storing only current state?

AFaster reads — event sourcing requires less computation to get current state

BComplete audit trail — you can replay events to reconstruct any past state and understand exactly what happened and when

CLess storage — events are smaller than storing the full current state

DSimpler queries — you can run ad-hoc SQL queries against the event log

Q2. In event sourcing, how do you get the current balance without replaying all events from the beginning?

AStore both events and current state in parallel, always keeping both in sync

BUse a cache layer that expires every hour and forces full replay

CSnapshots — periodically capture current state; at replay time, start from the last snapshot and apply only subsequent events

DBuild an index of events sorted by their balance impact

Q3. Event sourcing is particularly suited for?

AAny CRUD application that needs simple create/read/update/delete operations

BFinancial, audit-required, or compliance domains where complete history and the ability to reconstruct any past state are required

CSocial media applications where speed of reads is the only priority

DRead-heavy analytics where writes are infrequent

Event sourcing stores what happened, not what is. The event log is append-only and immutable — you can never delete or modify a past event. To get current state, replay all events through the aggregate. Performance optimization: snapshots capture a point-in-time state; at startup, load the latest snapshot and replay only subsequent events. Snapshot frequency is tunable: every 100 events, every hour, or on demand. This is critical for accounts with millions of transactions.

Open Design Challenge

1

Design the event schema for a bank account. What fields does each event need? How do you version events when the schema changes in 2 years?

2

Design the snapshot strategy for a high-transaction account (10,000 transactions/day). At what frequency do you snapshot? What does the snapshot contain?

3

A regulatory audit requires the account balance at every second of the last 7 years. How does event sourcing enable this? What's the performance cost?

Concept score: 0/3

Exercise 2 🟡 Easy ⏱ 20 min

✓ Completed

CQRS Pattern

An order management system has a complex domain model (orders, line items, discounts, promotions). Reads need a denormalized view (order summary with product names, total, shipping status). Writes use rich domain objects with business logic. CQRS separates the write model (commands) from the read model (queries) — each optimized independently.

Command Query Responsibility Segregation

Command: CreateOrder

→ OrderAggregate (domain logic)

→ Events: OrderCreated, ItemAdded

→ project →

Query: GetOrderSummary

← Read model (denormalized)

← order_summary table (fast)

Concept Check — 3 questions

Q1. CQRS: Commands vs Queries — what is the fundamental difference?

ACommands are fast; queries are slow — CQRS speeds up queries by separating them

BCommands mutate state (create/update/delete); queries read state without any side effects

CCommands use SQL databases; queries use NoSQL databases

DCommands are synchronous; queries are always asynchronous

Q2. The query side of CQRS is optimized by?

AAdding more database indexes to the write model's tables

BUsing faster CPUs and more RAM on the database server

CPre-computed materialized read models — denormalized, aggregated views stored specifically for each query shape

DCaching all database queries in Redis for 5 minutes

Q3. A read model in CQRS becomes stale when?

AAn event is published but the projection hasn't processed it yet — this is an inherent eventual consistency lag between write and read models

BThe read model's cache TTL expires and needs refreshing

CThe application server restarts and loses in-memory state

DThe read model is never stale — it is always current

CQRS separates the concern of writing correct state from reading state efficiently. The write side uses a rich domain model with business rules. The read side uses flat, denormalized projections tailored to exact query shapes. The projection (event handler) listens to events from the write side and updates the read model asynchronously — this creates eventual consistency. For most UIs, a few milliseconds of lag is invisible to users. CQRS is most valuable when read and write models have fundamentally different shapes.

Open Design Challenge

1

Design the read model schema for an order summary page. What fields does it pre-compute? How is it updated when an order item is added?

2

The projection consumer crashes while processing an OrderShipped event. The read model shows "Processing" but the order is actually shipped. How do you recover?

3

Design a CQRS system where the write model uses PostgreSQL (ACID) and the read model uses Elasticsearch (full-text search). How do you keep them synchronized?

Concept score: 0/3

Exercise 3 🔴 Medium ⏱ 25 min

✓ Completed

Stream Processing Windows

A fraud detection system processes transaction events. Rule: "flag user if they make more than 10 transactions in any 60-second window." A naive approach counts all transactions since account creation — not a true sliding window. Tumbling windows miss bursts that span two buckets. Sliding windows catch every burst regardless of timing.

Window Types Comparison

Tumbling: [0-60s] [60-120s] [120-180s]

Burst at 55-65s: split across 2 windows — missed!

Sliding (60s, 1s step): [0-60] [1-61] [2-62]...

Burst at 55-65s: caught in window [5-65] ✓

Session: gaps define boundaries

Concept Check — 3 questions

Q1. A tumbling window for 60 seconds means?

AOverlapping windows that shift by 1 second, each spanning 60 seconds

BFixed, non-overlapping 60-second buckets: [0-60s], [60-120s], [120-180s] — a burst spanning two buckets may go undetected

CWindows defined dynamically by gaps in user activity (session-based)

DAn infinite accumulation window that never resets

Q2. A sliding window with 60s size and 1s slide counts transactions in the preceding 60s for each second. This is more accurate for fraud detection because?

AIt is simpler to implement than tumbling windows

BIt uses significantly less memory than tumbling windows

CIt detects bursts at any point in time — there is no "safe boundary" that a fraudster can exploit to split a burst across two windows

DIt requires less CPU processing power than tumbling windows

Q3. Apache Flink vs Spark Streaming for fraud detection at 1M events/sec?

ASpark — it has better streaming performance than Flink at high throughput

BFlink — native streaming engine with event-time processing and sub-second latency; Spark Streaming uses micro-batches with higher latency

CThey perform identically for real-time fraud detection workloads

DNeither — use Redis sorted sets only for any real-time window counting

Window types: Tumbling = non-overlapping fixed buckets. Sliding = overlapping windows that shift by step size. Session = activity-based windows bounded by inactivity gaps. For fraud: sliding windows are ideal but memory-intensive (maintain state for each overlapping window). Flink advantage: it processes events as they arrive (true streaming) with millisecond latency and native event-time semantics for out-of-order events. Spark Streaming uses micro-batches (collect events → process batch → repeat) with 0.5-2s minimum latency — acceptable for many use cases but not sub-second fraud detection.

Open Design Challenge

1

Design a Flink job for fraud detection: "flag if user makes >10 transactions in 60 seconds OR total amount >$5000 in 5 minutes." What window types do you use for each rule?

2

Events arrive out-of-order in Kafka (network delays mean event at t=5s arrives after event at t=7s). How does Flink's event-time processing handle this? What is the watermark?

3

Design the alert pipeline: when fraud is detected, the system must block the transaction in <100ms. How do you connect the Flink job to the payment authorization service?

Concept score: 0/3

Exercise 4 🔥 Hard ⏱ 30 min

✓ Completed

Event Replay and Read Model Rebuilding

An e-commerce platform has used event sourcing for 2 years. New feature needed: total lifetime spend per customer. The read model doesn't exist yet — they must replay 500 million historical events to build it. The system must continue serving live traffic during the rebuild, which could take hours.

Read Model Rebuild During Live Traffic

Event log
500M events

→ replay pipeline →

New projection
lifetime_spend per customer

→ catches up →

Switch to live
start serving queries

Concept Check — 3 questions

Q1. Building a new read model from historical events requires?

AA full database migration with ALTER TABLE and data backfill statements

BReplaying the full event log from the beginning through the new projection — typically done offline, then switching to consuming live events

CRewriting all historical events to match the new read model's schema

DRestoring from the latest database backup and transforming it

Q2. During read model rebuild, how do you serve the new "lifetime spend" feature?

ABlock all reads on the lifetime spend feature until the rebuild completes

BReturn partial (incorrect) data to users while the replay is in progress

CReturn a "building" status or null response while replay is in progress — communicate the expected completion time to clients

DUse the old read model as a placeholder, even though it lacks lifetime spend data

Q3. Event schema evolution: the OrderCreated event gained a new tax_amount field 1 year ago. Events before that have no tax_amount. During replay, how do you handle old events?

AReject old events that don't have the required tax_amount field

BUpcasting — transform old events during replay to add default values for new fields (tax_amount=0 for pre-tax-era events)

CDelete all old events and ask customers to re-submit their orders

DRun a manual schema migration to backfill tax_amount on all historical events

Read model rebuild process: 1) Start the new projection consumer at offset 0 of the event log. 2) Process all historical events — this may take hours for 500M events. 3) When the consumer reaches "now" (current event log position), switch to live mode. 4) Only then expose the new read model to queries. Upcasting: event schema versioning requires an "upcaster" — a transformation layer that upgrades old event formats to the current schema before they reach the projection. Keep backward compatibility: add fields with defaults, never remove or rename fields.

Open Design Challenge

1

Design the replay pipeline for 500M events. How do you parallelize it across multiple workers? How do you ensure they don't process the same event twice?

2

The replay takes 6 hours. During this time, new OrderCreated events arrive. How do you ensure the new read model includes events from the replay AND events that arrived during rebuild?

3

Design a versioned upcaster system for the OrderCreated event with 3 schema versions (v1: no tax, v2: tax added, v3: tax + shipping_cost). How does the upcaster chain work?

Concept score: 0/3

Day 13 Complete 🎉