Replication and Failover
Replication improves availability and read scale. It does not magically make writes free or consistency perfect.
Replication Basics
Primary database:
- accepts writes
- often serves some reads
Replicas:
- receive changes from the primary
- usually serve reads
- may be promoted during failover
Synchronous vs Asynchronous
Synchronous replication
Primary waits for replica acknowledgement before confirming commit.
Pros:
- stronger durability guarantees
Cons:
- higher write latency
- availability tradeoff if replicas are unhealthy
Asynchronous replication
Primary commits locally and replicas catch up later.
Pros:
- lower write latency
Cons:
- replica lag
- possible data loss window on failover
Read Replicas
Read replicas help when reads dominate and your application can tolerate some staleness.
Common issues:
- user writes data then immediately reads stale replica
- analytics query overloads a replica
- lag spikes under write bursts
Typical fix for read-after-write paths:
- route those reads to primary for a short time window
Failover
Failover means promoting a replica when the primary becomes unavailable.
Questions to ask:
- how is failure detected
- who promotes the replica
- how fast do clients reconnect
- what happens to in-flight writes
Database failover is not finished when the replica is promoted. The app must also reconnect cleanly.
Interview Answer
Why use replication?
To improve availability, disaster recovery posture, and read scalability.
What is the main tradeoff?
The core tradeoff is consistency versus latency and availability. Asynchronous replication is faster but allows lag and some failover data loss risk, while synchronous replication reduces that risk at the cost of write latency and resilience.