← All posts

Building Scalable APIs That Don't Fall Over

Most APIs work great with ten users and fall apart with ten thousand. The good news is that the patterns that keep an API fast and reliable under load are well understood — you just have to apply them deliberately. Here are the ones I reach for on almost every project.

1. Cache aggressively, invalidate carefully

The fastest request is the one you never make. Put a cache in front of expensive reads — whether that's an in-memory layer like Redis, HTTP caching with proper Cache-Control headers, or a CDN at the edge. The hard part is never caching; it's invalidation. Prefer short TTLs and event-driven invalidation over guessing.

2. Always paginate

Never return an unbounded list. A GET /users that returns every row is a time bomb. Use cursor-based pagination for large or frequently-changing datasets:

GET /api/users?limit=50&cursor=eyJpZCI6MTI4fQ

{
  "data": [ ... ],
  "next_cursor": "eyJpZCI6MTc4fQ",
  "has_more": true
}

3. Make writes idempotent

Networks fail and clients retry. If a retried POST /payments charges the customer twice, that's a real problem. Accept an Idempotency-Key header and store the result against it so repeated requests return the same response instead of duplicating work.

4. Protect yourself with rate limits

Rate limiting isn't only about abuse — it protects your own infrastructure from a single misbehaving client. A token-bucket limiter per API key is simple and effective, and returning 429 with a Retry-After header tells well-behaved clients exactly what to do.

5. Degrade gracefully

When a downstream dependency is slow or down, fail fast with timeouts and circuit breakers instead of letting requests pile up. Returning a slightly stale cached value — or a partial response — is almost always better than a spinning loader and a thread pool that's run out of threads.

A resilient API is one that has already decided what to do when things go wrong — before they go wrong.

Wrapping up

None of these techniques are exotic. Apply caching, pagination, idempotency, rate limiting and graceful degradation consistently, measure everything, and your API will keep its composure long after the naive version would have fallen over.

← Back to all posts