Day 14 Exercises — Ride-Sharing System Capstone

Exercise 1 🟡 Easy ⏱ 15 min

✓ Completed

Driver Location Tracking

1 million active drivers send GPS updates every 4 seconds = 250,000 writes/sec. With PostgreSQL, each update acquires a row lock on the driver's row — at 250K writes/sec, lock contention grinds the DB to a halt. The solution: Redis for real-time location (geohash sorted sets) and batch PostgreSQL writes every 5 minutes for history/billing.

Location Tracking Architecture

1M drivers
GPS every 4s

→ 250K/sec →

Redis GEOADD
O(log n) per write

→ batch every 5min →

PostgreSQL
billing / history

→

GEORADIUS query
find nearby drivers

Concept Check — 3 questions

Q1. Redis GEOADD stores driver locations as?

AJSON strings containing lat/lng pairs stored in a hash map

BGeohash-encoded 52-bit integers stored in a sorted set — enabling efficient radius searches with GEORADIUS/GEOSEARCH

CBinary blobs with proprietary coordinate encoding

DSeparate sorted sets for latitude and longitude values

Q2. To find all drivers within 2km of a rider's location, the Redis command is?

ASCAN all driver keys and compute distances in the application layer

BKEYS driver:* to get all drivers, then filter by coordinates

CGEORADIUS or GEOSEARCH with a 2km radius — returns driver IDs sorted by distance from the query point

DZRANGEBYLEX to scan the geohash sorted set lexicographically

Q3. At 250K location writes/sec, Redis pipelining is used to?

AReplicate GEOADD commands to multiple Redis replica nodes simultaneously

BBatch multiple GEOADD commands in a single round-trip — reduces per-write network overhead dramatically

CAutomatically compress location data before storing

DAutomatically add TTL to driver location keys

Redis geo commands use geohash internally — lat/lng is encoded into a 52-bit integer score in a sorted set. This enables proximity searches: GEOSEARCH drivers FROMLONLAT -122.4194 37.7749 BYRADIUS 2 km ASC COUNT 500. Redis pipeline: instead of 250K individual network round-trips (each ~0.1ms = 25 seconds total!), batch 1000 commands per pipeline call → 250 round-trips total → ~25ms. Redis is single-threaded but handles 100K+ ops/sec per core. Set TTL on driver keys so offline drivers auto-expire: EXPIRE driver:42 30 (30s heartbeat timeout).

Open Design Challenge

1

Design the data model for driver locations in Redis. What key names, data structures, and TTL values do you use? How do you handle a driver going offline?

2

Design the batch flush from Redis to PostgreSQL. What happens if Redis crashes before the flush? Is location data loss acceptable?

3

At peak, 1000 riders all search for drivers in downtown simultaneously. How do you prevent 1000 GEOSEARCH queries from overloading Redis?

Concept score: 0/3

Exercise 2 🔴 Medium ⏱ 20 min

✓ Completed

Matching Algorithm

A rider requests a ride at (37.7749, -122.4194). The system must find the best driver within 5 minutes. "Best" = weighted score: 60% ETA + 30% driver rating + 10% car type preference. 500 available drivers within 2km. The system notifies the top-scored driver and waits 15 seconds for acceptance before trying the next.

Matching Pipeline

Rider request
lat/lng

→

GEOSEARCH 2km
500 candidates

→

Score each
ETA + rating + type

→

Notify top driver
15s timeout

→ accept/reject →

Next candidate
if rejected

Concept Check — 3 questions

Q1. If the top driver doesn't accept within 15 seconds, the system should?

ACancel the ride request and ask the rider to try again

BMove to the next-highest scored driver from the pre-ranked candidate list

CWait indefinitely for the driver to respond

DNotify all 500 candidate drivers simultaneously

Q2. To prevent the same driver being matched to multiple riders simultaneously, you need?

AA database transaction on the PostgreSQL driver availability table

BAn application-level mutex that checks availability before assigning

CA distributed lock on driver_id — Redis SETNX atomically checks availability and reserves the driver

DThe driver's mobile app polls for assignments every second

Q3. ETA calculation for matching requires?

AEuclidean (straight-line) distance only — it's fast and accurate enough

BTraffic-aware routing from driver location to rider pickup — straight-line distance is inaccurate for real urban travel time

CHistorical average travel times from past rides in the same area

DDriver-reported ETA from their navigation app

Matching cascade: pre-rank all 500 candidates → notify #1 → 15s timeout → move to #2 → repeat. Never notify multiple drivers simultaneously (wastes driver attention). Distributed lock for driver assignment: SET lock:driver:42 rideId NX PX 20000 — NX means "only if not exists," PX sets TTL in milliseconds. If SETNX returns 0, the driver is already locked by another ride. ETA via Maps API: at 500 candidates, you'd need batching — use Google Distance Matrix API (up to 25 origins × 25 destinations per call) or maintain pre-computed road graph for speed.

Open Design Challenge

1

Design the scoring function that combines ETA (60%), rating (30%), and car type preference (10%). How do you normalize these three different-scale metrics?

2

At peak, 10,000 simultaneous ride requests each need ETAs for 500 drivers. That's 5M Maps API calls/second. Design a more scalable ETA approach.

3

Design the 5-minute matching timeout: if no driver accepts after 5 minutes, how do you notify the rider? Should the radius expand progressively?

Concept score: 0/3

Exercise 3 🔴 Medium ⏱ 25 min

✓ Completed

Surge Pricing

A concert ends — 50,000 people request rides simultaneously in a 3km radius. Available drivers: 200. Supply/demand ratio: 1:250. Surge multiplier: 4×. Current implementation runs a SQL aggregate every 30 seconds — too slow to detect and respond to demand spikes within seconds. Real-time stream processing is needed.

Surge Pricing Pipeline

Ride requests
+ driver locations

→

Kafka stream
events per geohash cell

→

Flink aggregation
supply/demand per cell

→

Surge multiplier
formula applied

→

Price shown
to rider in app

Concept Check — 3 questions

Q1. Geohash cells are used for surge pricing because?

AThey produce hexagon-shaped cells that look better on a map

BGeohash divides Earth into hierarchical cells — you can aggregate demand events per cell at any precision level, enabling fast local supply/demand calculation without expensive geo queries

CThey are easier to store in a relational database than lat/lng pairs

DGoogle Maps uses geohash internally so it's the industry standard

Q2. Real-time surge pricing needs sub-second updates. The architecture is?

APostgreSQL batch query running every 30 seconds with GROUP BY geohash

BA cron job that computes surge factors and writes to a config table

CStream processing (Kafka + Flink) for real-time per-geohash-cell supply/demand aggregation with sub-second latency

DManual surge adjustments made by on-call operations engineers

Q3. A rider sees 4× surge pricing, closes the app, and reopens 10 minutes later when surge is 2×. They should be charged?

A4× (the original surge they saw before closing the app)

B2× — the price shown and accepted at the time of booking confirmation, not the earlier quote

CBase price — surge should be removed as a goodwill gesture

DThe average of 4× and 2× = 3× surge

Geohash precision: a 6-character geohash covers ~1.2km × 0.6km — perfect for neighborhood-level surge. Encode rider requests and driver pings with their geohash prefix. Flink aggregates: count ride_requests and available_drivers per geohash cell in a 1-minute sliding window → surge_ratio = requests / drivers → surge_multiplier = min(4.0, max(1.0, ratio × 0.5)). Store results in Redis: SET surge:dr5ru 2.4 EX 60. Price locking: when a rider confirms booking, the accepted price is stored atomically. Surge changing after confirmation does not affect the locked-in price.

Open Design Challenge

1

Design the surge multiplier formula. Given supply=200 drivers and demand=50,000 requests in a 3km zone, what multiplier do you compute? How do you cap it at 4×?

2

Surge pricing can feel unfair at extreme multipliers. Design a "surge warning" UX: how do you show the rider the surge is active and get explicit confirmation before charging?

3

Design the driver incentive system: when surge is detected in a cell, how do you notify nearby idle drivers to move into the surge zone?

Concept score: 0/3

Exercise 4 🔥 Hard ⏱ 35 min

✓ Completed

Ride State Machine

A ride progresses through states: REQUESTED → MATCHED → DRIVER_ARRIVING → IN_PROGRESS → COMPLETED. Each state transition must be atomic, auditable, and idempotent. Race conditions abound: can MATCHED transition directly to CANCELLED? Can IN_PROGRESS be re-MATCHED? Invalid transitions must be rejected, not silently allowed.

Ride State Machine

REQUESTED
rider submits

→ driver accepts →

MATCHED
driver assigned

→ driver starts →

DRIVER_ARRIVING
en route

→ rider boards →

IN_PROGRESS
trip active

→ destination →

COMPLETED

or any state →

CANCELLED

Concept Check — 3 questions

Q1. To prevent invalid state transitions (e.g., IN_PROGRESS → MATCHED), the pattern is?

AUse only database CHECK constraints to enforce valid states

BExplicit state machine validation: each transition checks the current state and rejects the operation if the transition is not in the allowed set

COptimistic locking only — detect conflicts after the fact

DLog the invalid transition but allow it to proceed anyway

Q2. Preventing duplicate notifications (driver "arriving" SMS sent exactly once) uses?

ARate limiting the SMS provider to prevent duplicate sends

BDatabase transactions that wrap the SMS send call

CAn idempotency key — store the notification event ID; check before sending; mark as sent atomically after successful send

DSMS providers guarantee exactly-once delivery natively

Q3. A rider cancels while the driver is en route (DRIVER_ARRIVING state). The system must?

AWait for the driver to physically arrive before processing the cancellation

BAtomically transition to CANCELLED, notify the driver, calculate the cancellation fee, and lock against further driver actions on this ride

CRefund all charges immediately without calculating any cancellation fee

DAllow the driver to mark the ride as completed and charge the full fare

State machine implementation: define an allowed transitions map:

{REQUESTED: [MATCHED, CANCELLED], MATCHED: [DRIVER_ARRIVING, CANCELLED], DRIVER_ARRIVING: [IN_PROGRESS, CANCELLED], IN_PROGRESS: [COMPLETED]}

. Before any transition: SELECT state FROM rides WHERE id=? FOR UPDATE → validate transition is allowed → update state. The FOR UPDATE lock prevents concurrent transitions from racing. Cancellation atomicity: in a single DB transaction → update rides.state='CANCELLED' + insert into notifications (driver SMS) + insert into billing (cancellation fee). Idempotency on notifications: INSERT INTO sent_notifications (ride_id, type) VALUES (?, 'cancel_driver') ON CONFLICT DO NOTHING.

Open Design Challenge

1

Design the PostgreSQL schema for the ride state machine. What tables, columns, constraints, and indexes do you need? How do you store the full state transition audit log?

2

Design the race condition: rider cancels at the same moment the driver marks "arrived." Both transitions are valid from DRIVER_ARRIVING. How do you resolve this conflict?

3

After a driver completes a ride, design the post-ride pipeline: payment processing, receipt email, rating request, earnings credit, and driver availability update. Which are synchronous? Which are async?

Concept score: 0/3

Day 14 Complete 🎉