system-design.io

Read Replicas and Query Routing

Scaling reads without sacrificing write consistency


Bridge: The Library Analogy

Imagine a bustling public library where patrons constantly borrow (read) books while librarians return and re‑shelve (write) them. If every patron had to wait for a librarian to finish shelving before they could browse, the line would stretch out the door. Instead, the library creates exact copies of the most popular sections and places them in reading rooms. Patrons grab a copy from the nearest room, while librarians continue updating the master collection in the back office.

In this analogy:

Just as the library scales read traffic by adding more copies, a database scales read‑heavy workloads by adding read replicas and intelligently routing queries.


1. Core Concept: What Is a Read Replica?

A read replica is a read‑only copy of a primary database instance that is kept up‑to‑date via asynchronous replication. Writes go to the primary; reads can be served by any replica, reducing the load on the primary and increasing overall read throughput.

1.1 Atomic Unit

Start with a single primary‑replica pair:

Primary (writes)  -->  Replica (reads)
        ^                 |
        |  async replication (lag)
        +-----------------+

1.2 Why We Need It

Most web‑scale applications are read‑heavy (often 80‑90% reads). A single primary can become a bottleneck because:

Adding a replica linearly adds read capacity: N replicas give roughly read throughput, while write capacity stays tied to the primary.


2. Identifying the Limitation: Replication Lag

The simple primary‑replica model works until replication lag becomes visible to users. If a replica is behind the primary, a read may return stale data.

2.1 The Pain Point

Consider a social‑media feed: a user posts a photo (write) and immediately tries to view it (read). If the read hits a replica that hasn’t yet applied the upload, the photo appears missing—a confusing experience.

2.2 The Fix: Consistency‑Aware Routing

We add a routing policy that decides, per query, whether to hit the primary or a replica based on the required consistency:

Consistency Need Routing Decision
Strong (must see own writes) Primary
Eventual (can tolerate lag) Any replica
Bounded staleness (lag < X ms) Replica only if its estimated lag ≤ X

This turns the broken state (unaware reads) into a fix: the router inspects each query (or session) and picks the right endpoint.


3. Adding Features: From Basic to Modern Routing

3.1 Application‑Level Splitting

The simplest fix lives inside the service code. The application maintains two connection pools:

# Pseudo‑code
write_pool  = create_pool(primary_url)
read_pool   = create_pool(replica_urls)

def handle_request():
    if request.is_write():
        conn = write_pool.acquire()
    else:
        conn = read_pool.acquire()
    # execute query …

Pros: maximum flexibility—custom lag‑aware selection, session pinning, per‑tenant policies. Cons: every service team must re‑implement and maintain the logic, handle failover, and track topology changes.

3.2 Proxy‑Based Routing

A database middleware (proxy) sits between app and DB, transparently routing writes to the primary and reads to replicas. The app sees a single endpoint; the proxy does the work.

+--------+     +-----------+     +----------+
| App    | --->|   Proxy   | --->| Primary  |
|        |     | (router)  |     | (writes) |
+--------+     +-----------+     +----------+
                         |
                         |   +----------+   +----------+
                         +-->| Replica1 |   | Replica2 |
                             +----------+   +----------+

Popular implementations:

Pros: centralizes routing logic, handles failover, can add observability (metrics on lag, query mix). Cons: extra hop adds latency; requires careful tuning of health checks and lag thresholds.

3.3 DNS‑Based Load Balancing

Instead of a smart proxy, use weighted or latency‑based DNS records (e.g., Amazon Route 53) to distribute read traffic across replicas.

This approach works well when replicas are geographically distributed (cross‑region read replicas). Pros: no extra software layer, leverages existing CDN/DNS infrastructure. Cons: less granular control; difficult to enforce per‑query consistency; DNS TTL changes propagate slowly.

3.4 Hybrid \& Advanced Policies

Modern systems combine the above techniques:

Technique When to Use
Session pinning After a write, pin reads to primary for a short window (e.g., 200 ms) to guarantee read‑your‑writes.
Lag‑aware selection Replica health checks report replication lag; router avoids replicas exceeding a threshold.
Query classification INSERTS/UPDATES/DELETES → primary; SELECT → replica; SELECT … FOR UPDATE → primary (needs lock).
Multi‑tenant routing Different tenants may have different consistency requirements; router respects per‑tenant flags.

These patterns appear in frameworks like DatabaseRouter shown in the Oneuptime blog and the Azure read‑replica documentation.


4. Categorical Chunking: Types of Query Routing Strategies

We group strategies by behavior rather than by implementation detail.

4.1 Consistency‑Driven

4.2 Traffic‑Splitting

4.3 Failover‑Aware

4.4 Operation‑Based


5. Visual Explanation: ASCII \& SVG‑Bobs

Below is an SVG‑Bobb diagram showing a typical three‑replica setup with a proxy router.

+--------+     +-------------+     +--------+
|  App   | --->|  Proxy      | --->| Primary|
| (reads/writes) | (router)  | (writes) |
+--------+     +-------------+     +--------+
                         |
            +------------+------------+------------+
            |            |            |            |
    +-------v----+ +------v------+ +------v------+
    | Replica 1  | | Replica 2   | | Replica 3    |
    | (reads)    | | (reads)     | | (reads)      |
    +------------+ +-------------+ +--------------+

How to read this diagram:


6. Code Walk‑Through: Python Implementation

Let’s look at a practical, production‑ready router that supports lag‑aware, round‑robin replica selection and session pinning.

import time
import random
from typing import List, Tuple
import psycopg2
from psycopg2.extras import RealDictCursor

class ReadReplicaRouter:
    def __init__(
        self,
        primary_dsn: str,
        replica_dsns: List[str],
        max_lag_ms: int = 200,
        pin_after_write_ms: int = 200,
    ):
        self.primary_dsn = primary_dsn
        self.replica_dsns = replica_dsns
        self.max_lag_ms = max_lag_ms
        self.pin_after_write_ms = pin_after_write_ms
        self._last_write_ts = 0.0
        self._replica_index = 0

    # -----------------------------------------------------------------
    # Helper: get current replication lag in seconds from a replica
    # -----------------------------------------------------------------
    def _get_replica_lag(self, dsn: str) -> float:
        try:
            with psycopg2.connect(dsn) as conn:
                with conn.cursor() as cur:
                    # PostgreSQL: pg_last_xact_replay_timestamp() gives replay time
                    cur.execute(
                        "SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))"
                    )
                    lag_sec = cur.fetchone()[^0]
                    return lag_sec * 1000  # to ms
        except Exception:
            return float("inf")  # treat as unhealthy

    # -----------------------------------------------------------------
    # Choose a read connection: respect pinning and lag thresholds
    # -----------------------------------------------------------------
    def _pick_replica(self) -> str:
        now = time.time()
        # If we are within the pin window after a write, force primary
        if now - self._last_write_ts < self.pin_after_write_ms / 1000:
            return self.primary_dsn

        # Filter replicas by lag
        viable = []
        for dsn in self.replica_dsns:
            lag = self._get_replica_lag(dsn)
            if lag <= self.max_lag_ms:
                viable.append((dsn, lag))

        if not viable:
            # Fallback to primary if no replica meets lag requirement
            return self.primary_dsn

        # Round‑robin among viable replicas
        self._replica_index = (self._replica_index + 1) % len(viable)
        return viable[self._replica_index][^0]

    # -----------------------------------------------------------------
    # Public API
    # -----------------------------------------------------------------
    def write(self, sql: str, params: Tuple = ()) -> List[dict]:
        self._last_write_ts = time.time()
        with psycopg2.connect(self.primary_dsn) as conn:
            with conn.cursor(cursor_factory=RealDictCursor) as cur:
                cur.execute(sql, params)
                conn.commit()
                if cur.description:
                    return cur.fetchall()
                return []

    def read(self, sql: str, params: Tuple = ()) -> List[dict]:
        dsn = self._pick_replica()
        with psycopg2.connect(dsn) as conn:
            with conn.cursor(cursor_factory=RealDictCursor) as cur:
                cur.execute(sql, params)
                if cur.description:
                    return cur.fetchall()
                return []

# Usage example
router = ReadReplicaRouter(
    primary_dsn="host=primary.db.internal port=5432 dbname=mydb user=app password=secret",
    replica_dsns=[
        "host=replica1.db.internal port=5432 dbname=mydb user=app password=secret",
        "host=replica2.db.internal port=5432 dbname=mydb user=app password=secret",
        "host=replica3.db.internal port=5432 dbname=mydb user=app password=secret",
    ],
    max_lag_ms=150,
    pin_after_write_ms=200,
)

# Write – goes to primary
router.write(
    "INSERT INTO posts (user_id, content) VALUES (%s, %s)",
    (42, "Hello replicas!")
)

# Read – goes to a lag‑healthy replica (or primary if pinned)
posts = router.read("SELECT * FROM posts WHERE user_id = %s", (42,))

Explanation of the code

  1. Lag detection – each replica reports its replay lag via pg_last_xact_replay_timestamp().
  2. Pinning – after a write, reads are forced to the primary for a configurable window (default 200 ms).
  3. Round‑robin – among replicas that satisfy the lag threshold, we cycle to spread load evenly.
  4. Fallback – if no replica meets the lag requirement, we safely route to the primary to avoid serving stale data.

This example illustrates how the “problem‑solution narrative” plays out in code: we start with a naïve router, notice the lag problem, then add pinning and lag‑aware selection.


7. Trade‑Offs \& Decision Matrix

Strategy Complexity Consistency Control Operational Overhead Best Fit
Application‑level High (per‑service) Fine‑grained (per‑query) High (each team maintains) Custom policies, multi‑tenant SaaS
Proxy‑based Medium (single component) Medium (router‑level) Medium (proxy ops, monitoring) General‑purpose apps wanting transparency
DNS‑based Low (DNS changes) Low (traffic‑only) Low (no extra software) Geo‑distributed read‑only workloads, CDN‑edge
Hybrid (proxy + DNS) Medium‑High High Medium‑High Large scale, strict SLOs, multi‑region

Key take‑aways


8. Historical Context

The read‑replica pattern emerged in the early 2000s as MySQL replication gained popularity. Early websites (e.g., Slashdot, LiveJournal) used master‑slave setups to scale reads. The term “read replica” became mainstream with Amazon RDS (2009) and Google Cloud SQL (2011), which offered managed replication and built‑in read‑endpoint routing.

Over the last decade, the ecosystem matured:

Understanding this history helps us appreciate why modern routers combine lag‑awareness, pinning, and multi‑tier replication.


9. Best‑Practice Checklist

When designing a read‑replica + query‑routing system, run through this checklist:


10. Conclusion

Read replicas turn a single‑node write bottleneck into a horizontally scalable read layer, while query routing decides where each operation lands to balance performance and consistency.

We began with a simple library analogy, identified the core problem (replication lag causing stale reads), and iteratively added features: application‑level splitting, proxy‑based routing, DNS load‑balancing, and advanced policies like session pinning and lag‑aware selection.

By chunking strategies by behavior—consistency‑driven, traffic‑splitting, failover‑aware, operation‑based—we can pick the right tool for the job. Visual diagrams and a concrete Python router illustrate how the ideas become code.

Finally, a historical perspective and a best‑practice checklist equip you to evaluate, implement, and operate a read‑replica system that meets the scalability demands of modern applications without surprising your users with out‑of‑date data.

Remember: the goal isn’t merely to add more copies; it’s to route wisely, so every user gets the data they need, when they need it, while the system stays responsive under load.