Retries, Idempotency, and Deduplication

Distributed systems fail in partial ways:

request succeeds but response is lost
timeout happens after side effect already occurred
worker crashes after doing the work but before acknowledging it

That is why retries alone are dangerous unless paired with idempotency.

Retries

Retries help with transient failures:

network blips
429s
short dependency outages

Good retries use:

capped exponential backoff
jitter
clear retryable error rules

Bad retries cause retry storms.

Idempotency

An operation is idempotent if repeating it produces the same final effect.

Classic example: charging a payment should not happen twice just because the client retried.

Pattern:

client sends idempotency key
server stores key + result
repeated request returns prior result instead of duplicating side effects

Deduplication

Deduplication is the broader system-level pattern of ensuring repeated messages or jobs do not trigger duplicate work.

Common techniques:

unique DB constraints
processed-message tables
job ids in queues
Redis set with TTL

Interview Answer

Why do retries require idempotency?

Because retries are how we survive transient failure, but without idempotency they can duplicate side effects like charges, emails, or state transitions. The safe design is retry with backoff plus an idempotency or deduplication mechanism at the write boundary.