Real-time channel fan-out, persistent WebSocket connections at scale, message threading, workspace-scoped search, and file sharing with S3 pre-signed URLs.
Before diving into the simulation, understand the four architectural pillars that make Slack's real-time messaging work at 12M DAU scale.
Messages stored per channel; fan-out to all online members on send. Channel membership cached in Redis as a set โ O(1) lookup for online/offline routing decisions.
Each Slack client holds a persistent WebSocket. ~12M concurrent connections distributed across gateway servers. Each gateway tracks its active connections in local memory + Redis for routing.
Parent message + thread_ts creates a threaded view. Slack stores the full thread tree in PostgreSQL. Replies include thread_ts pointing to the parent โ no recursive lookups needed.
Elasticsearch indexes all messages per workspace. Search is scoped to workspace โ not global. Each workspace has its own ES index shard, making full-text search tractable at scale.
HTTP long-polling adds 1-5 second latency and wastes server resources holding open connections. WebSocket lets the server push instantly to clients without any polling overhead. At 12M users, even 1s polling interval = 12M HTTP requests/second โ impossible without WebSocket.
Click "Post in #general" to see a message fan out to all online members. Toggle member status to see how offline members are handled via push notification queues.
| Field | Type | Notes |
|---|---|---|
id | varchar | 1630000000.000001 โ timestamp.sequence format |
channel_id | varchar | Which channel this message belongs to |
user_id | bigint | Who sent the message |
text | text | Message content (may be empty if file-only) |
thread_ts | varchar | Points to parent message id if this is a reply |
files | jsonb[] | Array of S3 file references with metadata |
reactions | jsonb | Map of emoji โ [user_ids] for reaction counts |
# Slack-style channel fan-out async def send_channel_message(channel_id: str, sender_id: str, text: str): msg_ts = f"{time.time():.6f}" # 1. Persist to database (source of truth) await db.execute(""" INSERT INTO messages (ts, channel_id, user_id, text) VALUES ($1, $2, $3, $4) """, msg_ts, channel_id, sender_id, text) # 2. Get channel membership (cached in Redis as a set) member_ids = await redis.smembers(f"channel:{channel_id}:members") # 3. Fan-out to online members via WebSocket online_members = await redis.smembers(f"channel:{channel_id}:online") event = {"type": "message", "channel": channel_id, "ts": msg_ts, "text": text} for member_id in online_members: conn_server = await redis.get(f"user:{member_id}:conn_server") if conn_server: await publish_to_server(conn_server, member_id, event) # 4. Push notifications for offline members offline = member_ids - online_members for member_id in offline: await queue_push_notification(member_id, channel_id, text)
Slack search is scoped to messages in your accessible channels only โ not global across all workspaces. This makes it feasible: each workspace is isolated into its own Elasticsearch index. Cross-workspace search would require indexing all messages globally, multiplying the index size by millions of workspaces.
| Feature | Technology | Latency |
|---|---|---|
| Full-text message search | Elasticsearch | <100ms |
| File & attachment search | Elasticsearch + S3 metadata | <200ms |
| Message history (recent) | PostgreSQL scan (indexed) | <50ms |
| Real-time index updates | Kafka โ ES consumer | <2s lag |
| Component | Choice | Why Not the Alternative? |
|---|---|---|
| Message storage | PostgreSQL | Strong consistency for message ordering; Cassandra would require manual ordering logic |
| Real-time delivery | WebSocket per gateway | SSE (Server-Sent Events) is unidirectional; WebSocket enables bidirectional protocol messages |
| Fan-out queue | Kafka (channel_id partition key) | Redis pub/sub loses messages if subscriber is down; Kafka persists and replays |
| Search | Elasticsearch per workspace | Solr: worse scalability and REST API; PostgreSQL FTS: no distributed sharding |
| File storage | S3 + pre-signed URLs | Storing files in DB: wastes IOPS and DB storage on binary blobs |
| Presence | Redis TTL keys | DB rows need a cleanup job; Redis TTL auto-expires when heartbeat stops |
Slack clients send WebSocket heartbeats every 30 seconds. The server calls SETEX presence:{user_id} 65 {gateway_server} on each heartbeat. If a client crashes without sending a disconnect, the key expires in 65 seconds automatically โ no background cleanup job required.