Day 12 — Week 2

Message Queues & Apache Kafka

Message queues decouple producers from consumers, enabling async processing and buffering traffic spikes. Kafka extends this to a durable, replayable event log serving multiple consumer groups simultaneously.

Kafka Partitions Consumer Groups Offsets & Replay At-Least-Once Delivery Kafka vs RabbitMQ
Key Concepts
📭
Partitions
A Kafka topic is split into partitions — ordered, immutable logs. Each partition is owned by one consumer in a group. More partitions = more parallelism. A partition can only be read by one consumer per group at a time.
👥
Consumer Groups
A group of consumers that collectively consume a topic. Each partition is assigned to exactly one consumer in the group. Multiple independent groups can consume the same topic — each gets all messages (unlike traditional queues).
📌
Offsets
Each message in a partition has a sequential offset. Consumers track their read position as an offset. Committing an offset marks "I've processed up to here." On restart, resume from the committed offset — enabling replay.
🔁
At-Least-Once Delivery
Kafka guarantees at-least-once by default: messages won't be lost, but may be delivered twice if a consumer crashes after processing but before committing. Make your consumers idempotent to handle duplicates safely.
Interactive Simulation — Kafka Partitions & Consumer Groups

Produce messages and watch them distributed across 6 partitions. Simulate a consumer rebalance when a consumer leaves.

Messages produced: 0
Consumers (Group A): 3
Rebalances: 0
Architecture
Producers
App services
Kafka Brokers
3 brokers, RF=3
Topic: orders
6 partitions
Group: billing
3 consumers
Same topic
Group: analytics
independent offset
and
Group: notifications
independent offset
FeatureKafkaRabbitMQSQS
Message retentionDays/weeks (log)Until consumed (deleted)Up to 14 days
Multiple consumersYes — each group independentNo — message consumed onceNo — one consumer per msg
ReplayYes — seek to any offsetNoNo
Throughput~1M msgs/sec per broker~50K msgs/sec~3000 msgs/sec per queue
Best forEvent streaming, audit logs, real-time analyticsTask queues, RPC, routingSimple async decoupling on AWS
Code Example — Kafka Producer & Consumer
from confluent_kafka import Producer, Consumer, KafkaError

# Producer: send order events
producer = Producer({
    'bootstrap.servers': 'kafka1:9092,kafka2:9092,kafka3:9092',
    'acks': 'all',           # wait for all replicas
    'retries': 3,
    'enable.idempotence': True,  # exactly-once producer
})

def send_order(order: dict):
    producer.produce(
        topic='orders',
        key=order['user_id'],   # same key → same partition
        value=json.dumps(order).encode(),
        callback=lambda err, msg: print(f'Delivered to {msg.partition()}@{msg.offset()}')
    )
    producer.flush()

# Consumer: billing service
consumer = Consumer({
    'bootstrap.servers': 'kafka1:9092',
    'group.id': 'billing-service',
    'auto.offset.reset': 'earliest',
    'enable.auto.commit': False,  # manual commit for control
})
consumer.subscribe(['orders'])

while True:
    msg = consumer.poll(1.0)  # pull model
    if msg is None: continue
    if msg.error(): raise Exception(msg.error())

    order = json.loads(msg.value())
    process_billing(order)  # idempotent: safe to retry

    consumer.commit(msg)    # commit AFTER processing
Quiz
1. Why does Kafka use a pull-based model (consumers poll) rather than a push model?
2. A Kafka topic has 6 partitions and consumer group "billing" has 4 consumers. How are partitions distributed?
3. What happens if a Kafka consumer crashes after processing a message but before committing the offset?
4. When does Kafka trigger a consumer group rebalance?
5. You need to send notifications AND update analytics from the same order events. Should you use one consumer group or two?