logodev atlas
1 min read

Rate Limiting in Practice

Rate limiting is not only about blocking abuse. It also protects shared capacity and keeps one noisy client from degrading everyone else.


Common Algorithms

  • fixed window
  • sliding window / sliding log
  • token bucket
  • leaky bucket

In interviews, the most practical answer is usually token bucket or sliding window depending on fairness needs.


Where to Apply Limits

  • API gateway
  • edge / CDN
  • application middleware
  • per-tenant service boundary
  • downstream integration client

Real systems often have multiple limits at different layers.


Useful Dimensions

  • per IP
  • per user
  • per API key
  • per tenant
  • per route

100 req/min globally is rarely enough by itself.


Practical Considerations

  • return clear 429 Too Many Requests
  • include retry guidance if possible
  • separate auth endpoints from read endpoints
  • use stricter limits on expensive operations
  • keep counters in Redis or a gateway-native store

Interview Answer

How would you implement rate limiting?

Choose the algorithm based on fairness and simplicity, store counters in a low-latency shared store like Redis, and apply limits at the right identity boundary such as user, API key, or tenant. In practice, the hard part is policy design and distributed coordination, not just incrementing a counter.

[prev·next]