Week 1 · Day 4

DB Connection Pooling
& Proxy Patterns

Watch connections queue and fail without a pool. Drag load to see pool exhaustion in real time. Step through PgBouncer's three modes and understand which breaks prepared statements.

3
Simulations
~3h
Study Time
3
Quizzes
PgBouncer Session Mode Transaction Mode Connection Pool Sizing Read Replica Lag Query Routing
01 — The Problem

Why can't each app thread just open its own DB connection?

PostgreSQL allocates ~5-10MB per connection. A connection includes a dedicated backend process (fork), memory for query buffers, lock tables, and statistics. 500 connections = 2.5GB just in connection overhead — before any queries run. And establishing a new TCP+TLS+auth handshake takes 50-100ms.

🔌 Connection Pool Load Simulator
Drag "Concurrent Requests" to simulate load. Watch connections queue, succeed, or fail based on pool size. Toggle pooling on/off to see the difference.
0
Active Conns
0
Waiting (Queue)
0
Served
0
Rejected
📐

Pool Sizing Formula

Little's Law: N = λ × W. With 30 req/s and 50ms query time, you need N = 30 × 0.05 = 1.5 concurrent connections on average. Add headroom for bursts: multiply by 1.5-2×. So 3-4 connections is the theoretical minimum for this load.

⚠️

The "10× connections" antipattern

Many tutorials suggest pool_size = num_workers × 10. For 50 app servers × 10 workers × 10 = 5000 connections → PostgreSQL OOM. The formula is: pool_size = (core_count × 2) + effective_spindle_count per database server. For a 4-core Postgres instance: (4 × 2) + 1 = 9 connections per app server. With 50 app servers: 450 total — manageable.

02 — PgBouncer Modes

Session vs Transaction vs Statement — they're not equivalent

PgBouncer sits between your app and Postgres, multiplexing many app connections onto fewer DB connections. The mode determines when a DB connection is released back to the pool. Each mode has compatibility trade-offs.

🔄 PgBouncer Mode Step-Through
Select a mode, then press Next Step to see exactly when the DB connection is held vs released.

Session Mode: DB connection assigned when client connects. Released when client disconnects. Most compatible — supports all Postgres features. Least efficient — connection held even when client is idle.

Press Next Step to walk through the session lifecycle.

Transaction Mode: DB connection assigned at BEGIN and released at COMMIT/ROLLBACK. Multiple clients share the same pool of connections. Most efficient — 10× multiplexing possible. But: breaks prepared statements, advisory locks, and SET parameters.

Press Next Step to walk through the transaction mode lifecycle.

Statement Mode: DB connection released after every single statement. Highest efficiency but breaks multi-statement transactions entirely. Extremely rare in production — usually wrong for any application using transactions.

Press Next Step to walk through statement mode behavior.
ModeConnection HeldMultiplexingTransactionsPrepared StmtsSET params
SessionFull sessionNone (1:1)✓ Full✓ Yes✓ Yes
TransactionDuring txn onlyHigh✓ Yes✗ Broken✗ Broken
StatementSingle statementHighest✗ Broken✗ Broken✗ Broken
🔥

Transaction mode + ORMs = bugs

Most ORMs (SQLAlchemy, Django ORM, Sequelize) use prepared statements and SET search_path per connection. In transaction mode, these run on a different DB connection each time — SET parameters get lost, prepared statements fail. Fix: either use session mode, or configure the ORM to not use prepared statements (disable_server_side_cursors=True in SQLAlchemy, prepareThreshold=0 in psycopg).

03 — Read Replica Routing

Reads to replica, writes to primary — and the lag problem

A read replica receives writes asynchronously from the primary via WAL (Write-Ahead Log) replication. During high write load, the replica falls behind — called "replication lag." A read immediately after a write might return stale data from the replica.

📡 Replication Lag Visualizer
Drag write load to see replication lag build. Watch what happens to a "read-your-writes" operation when it goes to the replica too soon.
Adjust sliders to see replication lag. High write load + slow replica = dangerous lag.
Current Lag
0ms
Writes Applied to Replica
100%
Read-Your-Writes Risk
Low
📖

Read-Your-Writes Consistency

User updates profile → write to primary → immediately reads back → routed to replica → gets stale data. Fix strategies: 1. Route the same user's reads to primary for 1s after a write. 2. Read from primary for session-critical operations. 3. Pass a replication LSN token — replica only serves reads after applying that LSN.

🔀

Query Routing Logic

PgBouncer doesn't route reads/writes — it's a connection pool, not a query router. For read/write splitting, use Pgpool-II, ProxySQL (MySQL), or application-level routing. Application-level: check query for SELECT vs INSERT/UPDATE/DELETE and pick the connection pool accordingly.

# Application-level read/write routing (Python/SQLAlchemy) from sqlalchemy import create_engine from contextvars import ContextVar primary = create_engine("postgresql://primary:5432/db", pool_size=10) replica = create_engine("postgresql://replica:5432/db", pool_size=20) # Track if this request has done a write _wrote_in_txn: ContextVar[bool] = ContextVar('wrote', default=False) def get_engine(readonly: bool = True): # Always use primary if we've written in this request context if _wrote_in_txn.get() or not readonly: _wrote_in_txn.set(True) return primary return replica
04 — Connection Pool Sizing Deep Dive

The formula everyone gets wrong

Pool size isn't "as many as possible." Too many connections hurts Postgres as much as too few. The right pool size is derived from how your DB spends its time — CPU-bound queries and I/O-bound queries have different optimal concurrencies.

⚙️

CPU-Bound Queries

Aggregations, complex JOINs, report queries. Optimal concurrency = number of CPU cores. Adding more connections doesn't help — they just context-switch and fight for CPU. Pool size = db_cores × 2. For a 4-core RDS instance: 8 connections.

💾

I/O-Bound Queries

Simple lookups, index scans, waiting on disk. CPU is idle while I/O completes — more concurrency helps. Pool size = db_cores × 2 + disk_spindles. For NVMe SSD (treat as 1 spindle): same as CPU-bound. For HDD RAID-10 with 8 spindles: cores × 2 + 8.

🧮

The HikariCP Formula

From HikariCP's documentation (the authoritative Java pool): pool_size = (core_count × 2) + effective_spindle_count. This formula comes from measuring Postgres under real load. For managed DBs (RDS, Cloud SQL): treat spindle count as 1 (SSDs). 8-core RDS → 17 connections.

📊

Per App-Server Calculation

If you have 50 app servers each with 17-connection pools: 850 total connections. Postgres default: max_connections = 100. You must configure max_connections = 1000+ and PgBouncer pools to avoid "too many clients" errors. PgBouncer itself uses ~10 conns per pool, multiplexing thousands of app connections onto it.

🏆

Production PgBouncer configuration

pool_mode = transaction — for OLTP workloads (set ORM to disable prepared statements)
max_client_conn = 5000 — app connections to PgBouncer (cheap, no Postgres overhead)
default_pool_size = 20 — actual DB connections per database
max_db_connections = 100 — hard cap on total DB connections regardless of pools
server_idle_timeout = 600 — close idle DB connections after 10min
client_idle_timeout = 0 — keep app connections open indefinitely (app manages lifecycle)

Quiz — Check Your Understanding

Connection Pooling

Q1. You're using PgBouncer in transaction mode. Your Python app uses SQLAlchemy with server-side cursors (prepared statements enabled). After deploying PgBouncer, queries start failing with "prepared statement does not exist". Why?
Q2. Your service does: write profile update → immediately read profile → show to user. Reads are routed to a replica. Users sometimes see their old data after updating. What's happening and how do you fix it?
Q3. You have a 4-core PostgreSQL RDS instance (NVMe SSD). 20 app servers, each running 4 workers. What's the recommended pool size per app server, and what's the total connection count to Postgres?