logodev atlas
2 min read

Flaky Test Debugging

A flaky test is one that fails without a code change.

That is not just annoying. It erodes trust in the whole test suite.


Common Causes

  • timing assumptions
  • shared mutable state
  • order dependence
  • leaked network or DB data
  • background jobs still running
  • clock and timezone issues
  • test data collisions in parallel runs

Debugging Workflow

1. Reproduce reliably

Run the same test repeatedly:

bashfor i in {1..50}; do npm test -- flaky.spec.ts || break; done

2. Check isolation

Does it fail only:

  • after another test
  • in parallel
  • in CI
  • on one OS or timezone

3. Inspect nondeterminism

Look for:

  • Date.now()
  • random IDs
  • retries
  • asynchronous cleanup
  • race conditions

4. Add observability

Log enough to see ordering, state, and timing.


Fast Fixes That Often Work

  • wait on user-visible state, not arbitrary sleeps
  • reset DB / cache state between tests
  • seed deterministic timestamps and randomness
  • avoid global singleton mutation
  • make test data unique per run

What Not to Do

  • sprinkle sleep(1000) everywhere
  • mark flaky tests as retried forever and move on
  • ignore the issue because "CI eventually passes"

Retries can reduce noise temporarily, but they are not the root fix.


Interview Answer

How do you debug flaky tests?

First reproduce the flake repeatedly, then narrow whether it is caused by timing, shared state, ordering, or environment differences. Add enough visibility to observe the race, fix the nondeterminism directly, and use retries only as a short-term containment step.

[prev·next]