1 min read
Rate Limiting in Practice
Rate limiting is not only about blocking abuse. It also protects shared capacity and keeps one noisy client from degrading everyone else.
Common Algorithms
- fixed window
- sliding window / sliding log
- token bucket
- leaky bucket
In interviews, the most practical answer is usually token bucket or sliding window depending on fairness needs.
Where to Apply Limits
- API gateway
- edge / CDN
- application middleware
- per-tenant service boundary
- downstream integration client
Real systems often have multiple limits at different layers.
Useful Dimensions
- per IP
- per user
- per API key
- per tenant
- per route
100 req/min globally is rarely enough by itself.
Practical Considerations
- return clear
429 Too Many Requests - include retry guidance if possible
- separate auth endpoints from read endpoints
- use stricter limits on expensive operations
- keep counters in Redis or a gateway-native store
Interview Answer
How would you implement rate limiting?
Choose the algorithm based on fairness and simplicity, store counters in a low-latency shared store like Redis, and apply limits at the right identity boundary such as user, API key, or tenant. In practice, the hard part is policy design and distributed coordination, not just incrementing a counter.
[prev·next]