Message Queues & Apache Kafka
Queue vs Kafka, partitions, consumer groups, offset management, and delivery guarantees.
4 Exercises
12 Concept Checks
~95 min total
System Design
Session Progress
0 / 4 completed
Message Queue vs Kafka
An e-commerce platform uses RabbitMQ for order processing. New requirement: the analytics team also needs every order event to update dashboards. Problem: RabbitMQ deletes messages after the order processor acknowledges them — the analytics service can't replay what's already gone. Kafka's immutable log solves this by letting multiple consumer groups read independently.
Queue vs Kafka Architecture
RabbitMQ — message deleted after ack
Producer
↓
Queue (message)
↓ consumed + deleted
Order Processor only
Kafka — immutable log, multiple consumers
Producer
↓
Topic (immutable log)
↓ independent offsets
Group A: Order Processor
Group B: Analytics
Concept Check — 3 questions
Q1. Kafka topic vs RabbitMQ queue: the key architectural difference?
AKafka is inherently faster than RabbitMQ for all workloads
BKafka persists messages as an immutable log — multiple consumer groups read independently at their own offsets; RabbitMQ deletes messages after acknowledgement
CRabbitMQ supports more network protocols than Kafka
DKafka requires a SQL schema while RabbitMQ accepts any message format
Q2. In Kafka, adding a second consumer group for analytics?
AHas zero impact on the existing order processing consumer — both groups maintain independent offsets on the same topic
BSlows down the existing order processing consumer
CRequires duplicating the topic to create a separate copy
DConflicts with the existing consumer's offset commits
Q3. RabbitMQ is better than Kafka when?
AYou need high throughput at millions of events per second
BYou need event replay and long-term message retention
CYou need complex routing logic, per-message TTL, or simple task queue semantics with ack/nack-based routing
DYou need long-term data retention for compliance
Kafka's log is immutable — messages are appended and retained for a configurable period (days, weeks, forever). Each consumer group tracks its own offset independently. Adding Group B for analytics has zero effect on Group A's offset or throughput. RabbitMQ excels at: topic exchanges with routing keys, per-message TTL (messages expire if unprocessed), dead-letter queues with nack routing, and simple worker queue patterns where each message has exactly one consumer.
Open Design Challenge
Design the Kafka topic structure for an e-commerce platform. What topics would you create? How many partitions for each? Justify based on expected throughput.
The analytics team wants to replay the last 7 days of order events to rebuild their dashboard after a bug. How does Kafka enable this? What is the consumer group offset reset command?
Design a scenario where RabbitMQ is the better choice over Kafka. What routing rules would you configure?
Concept score: 0/3
Kafka Partitions and Ordering
A payment processing system sends payment events to Kafka. Business requirement: all events for the same payment_id must be processed in order (create → authorize → capture → refund). With 6 partitions and round-robin assignment, events for the same payment go to different partitions — ordering is broken and "refund before capture" is possible.
Round-Robin vs Key-Based Partitioning
❌ ROUND-ROBIN (ordering broken)
payment#123 create → P0
payment#123 authorize → P3
payment#123 capture → P1
✅ KEYED (same key = same partition)
payment#123 create → P2
payment#123 authorize → P2
payment#123 capture → P2 ✓ ordered
Concept Check — 3 questions
Q1. To guarantee ordering for a specific payment_id, Kafka messages should?
AUse round-robin partitioning for even load distribution
BUse payment_id as the message key — Kafka routes all messages with the same key to the same partition, guaranteeing order
CUse a single partition for the entire topic
DUse event timestamps to sort at the consumer
Q2. Kafka's ordering guarantee is?
AGlobal ordering across all partitions in a topic
BOrdering within the same consumer instance only
CWithin a partition only — messages across different partitions have no guaranteed ordering
DOnly when using Kafka transactions
Q3. Increasing Kafka partition count from 6 to 12 allows?
AStoring larger individual messages in each partition
BMore parallelism — a consumer group can have up to 12 consumers processing in parallel, one per partition
CCross-partition ordering guarantees
DBetter compression ratios for messages
Kafka's partitioning formula:
partition = hash(key) % numPartitions. Same key always maps to the same partition (assuming partition count doesn't change). Kafka guarantees ordering within a partition but NOT across partitions. The max parallelism of a consumer group = number of partitions — having more consumers than partitions means some consumers are idle. Increasing partitions: can be done but is irreversible and may break key-based ordering for existing keys.Open Design Challenge
You have 6 partitions and 10 million unique payment IDs. How does Kafka distribute these 10M keys across 6 partitions? Is the distribution perfectly even?
A customer has 1000× more transactions than average users — their payment_id maps to P3, creating a hot partition. How do you fix this without breaking ordering?
Design the consumer-side logic to process payment events in order. What state machine does the consumer maintain?
Concept score: 0/3
Consumer Groups and Rebalancing
6 Kafka partitions, 4 consumers in a group — each consumer handles ~1.5 partitions. Consumer #3 crashes mid-processing. Kafka triggers a rebalance: all consumers pause for ~2 seconds while partitions are reassigned. During this window, no messages are processed — a 2-second processing gap every time any consumer fails or is deployed.
Consumer Group Rebalance
Consumer 1 (P0, P1)
Consumer 2 (P2, P3)
Consumer 3 (P4, P5)
Consumer 4 (idle)
→ C3 crashes →
ALL PAUSE
rebalance triggered
rebalance triggered
→
Consumer 1 (P0, P1)
Consumer 2 (P2, P3)
Consumer 4 (P4, P5)
Concept Check — 3 questions
Q1. During consumer group rebalance, what happens to message processing?
AProcessing continues at half the rate on the remaining consumers
BAll consumers in the group pause completely — no messages are processed during the rebalance period
COnly the failed consumer's partitions pause; others continue
DMessages are permanently lost during the rebalance
Q2. Cooperative (incremental) rebalancing vs eager rebalancing: what is the cooperative advantage?
ACooperative rebalancing completes faster than eager rebalancing
BCooperative rebalancing uses less memory per consumer
COnly the partitions that need to move are revoked — healthy consumers keep processing their unchanged partitions during the reassignment
DThere is no practical difference between the two strategies
Q3. A consumer processes a message but crashes before committing the offset. What happens?
AThe message is permanently lost — Kafka only delivers it once
BThe message is redelivered to another consumer after the rebalance — Kafka considers a message processed only when its offset is committed
CThe message is moved to a dead-letter queue automatically
DKafka automatically commits the offset after a timeout
Eager rebalancing (the default before Kafka 2.4): ALL consumers stop-the-world, revoke all partitions, then redistribute. Cooperative rebalancing (Kafka 2.4+, incremental): only partitions that are moving are revoked — consumers keeping their existing partitions continue processing. To enable:
partition.assignment.strategy=CooperativeStickyAssignor. Offset commits: Kafka tracks consumer progress via committed offsets in an internal topic (__consumer_offsets). An uncommitted offset means the message will be redelivered — this is at-least-once delivery semantics.Open Design Challenge
Design a consumer that processes payment events with at-least-once delivery. What must you do before committing the offset? Where do you place the commit call?
A rolling deployment restarts all 4 consumers one by one. With eager rebalancing, this causes 4 rebalances × 2 seconds = 8 seconds of downtime. How does cooperative rebalancing improve this?
Design a dead-letter queue (DLQ) strategy for Kafka. When a message fails processing 3 times, where does it go and how do you alert on it?
Concept score: 0/3
Exactly-Once Semantics
A payment processor reads from Kafka and writes to a database. Without exactly-once: message is processed, DB write succeeds, but the Kafka offset commit fails → message is redelivered → payment is processed twice (double charge). With idempotency keys: the DB write is safe on replay. With Kafka transactions: atomically commit the offset and the DB write together.
Exactly-Once with Idempotency
Read from Kafka
partition + offset
partition + offset
→
Process payment
→
DB write
idempotency_key = offset
idempotency_key = offset
→
Commit offset
atomically
atomically
→
Crash & replay safe
dup key = skip
dup key = skip
Concept Check — 3 questions
Q1. Kafka exactly-once semantics (EOS) requires?
ASynchronous replication to all Kafka brokers before acknowledging
BIdempotent producers (enable.idempotence=true) combined with the transactional API to atomically commit offset and external write
CUsing a single partition only for the entire topic
DDisabling consumer groups and using direct partition assignment
Q2. An idempotency key for payment processing should be?
AA random UUID generated fresh for each processing attempt
BThe user's account ID combined with timestamp
CA stable unique identifier — combining Kafka topic+partition+offset guarantees uniqueness and stability across retries
DThe wall-clock timestamp of when processing started
Q3. Kafka transactions (begin/commit/abort) guarantee atomicity across?
AKafka writes and PostgreSQL writes simultaneously in a distributed transaction
BMultiple Kafka topic writes AND consumer offset commits atomically — either all succeed or all are rolled back
CWrites within a single partition only
DSynchronizing state between two separate Kafka clusters
Kafka EOS = idempotent producer + transactions. Idempotent producer: assigns a sequence number to each message — brokers deduplicate retries. Transactional producer:
producer.beginTransaction() → produce to multiple topics → sendOffsetsToTransaction() → commitTransaction(). All operations succeed atomically or are rolled back. For external DB writes (PostgreSQL), Kafka transactions don't help — use an idempotency key in the DB with a UNIQUE constraint so duplicate inserts fail gracefully (ON CONFLICT DO NOTHING).Open Design Challenge
Design the payment processor code flow: read from Kafka → process → write to DB with idempotency key → commit offset. What happens at each step if the process crashes?
The idempotency table in PostgreSQL grows unbounded. How do you expire old idempotency keys? What is the safe retention window?
Design an outbox pattern as an alternative to Kafka transactions: write to DB and outbox table in one local transaction, then publish from outbox to Kafka. How does this guarantee exactly-once?
Concept score: 0/3