API Rate Limiting Done Right: Algorithms and Implementation

Every production API needs rate limiting. Without it, a single bad actor — or a misconfigured client — can bring your service down. But not all rate limiting is equal.

Why Rate Limit?

Prevent abuse — stop DDoS and scraping
Fair usage — ensure no single user monopolizes resources
Cost control — downstream services (databases, third-party APIs) have limits too
Stability — protect against traffic spikes

Algorithm 1: Fixed Window

The simplest approach. Count requests per fixed time window (e.g., per minute).

async function fixedWindow(
  key: string,
  limit: number,
  windowMs: number,
  redis: Redis
): Promise<boolean> {
  const window = Math.floor(Date.now() / windowMs);
  const redisKey = `rl:${key}:${window}`;

  const count = await redis.incr(redisKey);
  if (count === 1) {
    await redis.pexpire(redisKey, windowMs);
  }

  return count <= limit;
}

Problem: Burst at window boundaries. If the limit is 100/minute, a user can send 100 requests at 0:59 and 100 more at 1:00 — 200 requests in 2 seconds.

Algorithm 2: Sliding Window Log

Track the timestamp of every request:

async function slidingWindowLog(
  key: string,
  limit: number,
  windowMs: number,
  redis: Redis
): Promise<boolean> {
  const now = Date.now();
  const windowStart = now - windowMs;
  const redisKey = `rl:${key}`;

  await redis
    .multi()
    .zremrangebyscore(redisKey, 0, windowStart) // Remove old entries
    .zadd(redisKey, now, `${now}:${Math.random()}`) // Add current
    .zcard(redisKey) // Count
    .pexpire(redisKey, windowMs)
    .exec();

  const count = /* result of zcard */;
  return count <= limit;
}

Accurate but memory-heavy — stores every request timestamp.

Algorithm 3: Sliding Window Counter

The sweet spot. Combines fixed window efficiency with sliding window accuracy:

async function slidingWindowCounter(
  key: string,
  limit: number,
  windowMs: number,
  redis: Redis
): Promise<boolean> {
  const now = Date.now();
  const currentWindow = Math.floor(now / windowMs);
  const previousWindow = currentWindow - 1;
  const elapsed = (now % windowMs) / windowMs; // 0.0 to 1.0

  const [currentCount, previousCount] = await Promise.all([
    redis.get(`rl:${key}:${currentWindow}`).then(Number),
    redis.get(`rl:${key}:${previousWindow}`).then(Number),
  ]);

  // Weighted count
  const estimatedCount =
    previousCount * (1 - elapsed) + currentCount;

  if (estimatedCount >= limit) return false;

  await redis.incr(`rl:${key}:${currentWindow}`);
  await redis.pexpire(`rl:${key}:${currentWindow}`, windowMs * 2);

  return true;
}

Low memory, no boundary bursts, good enough accuracy for most APIs.

Algorithm 4: Token Bucket

Best for allowing controlled bursts:

async function tokenBucket(
  key: string,
  capacity: number,     // Max burst size
  refillRate: number,   // Tokens per second
  redis: Redis
): Promise<boolean> {
  const now = Date.now();
  const redisKey = `rl:${key}`;

  const data = await redis.hgetall(redisKey);
  let tokens = parseFloat(data.tokens ?? capacity.toString());
  let lastRefill = parseInt(data.lastRefill ?? now.toString());

  // Add tokens based on elapsed time
  const elapsed = (now - lastRefill) / 1000;
  tokens = Math.min(capacity, tokens + elapsed * refillRate);

  if (tokens < 1) return false;

  tokens -= 1;

  await redis.hset(redisKey, {
    tokens: tokens.toString(),
    lastRefill: now.toString(),
  });
  await redis.pexpire(redisKey, (capacity / refillRate) * 1000 + 1000);

  return true;
}

Use token bucket when: you want to allow bursts up to a maximum, then throttle to a steady rate.

Which Algorithm to Pick

Algorithm	Accuracy	Memory	Burst Protection	Complexity
Fixed Window	Low	Very Low	❌	Simple
Sliding Log	High	High	✅	Medium
Sliding Counter	Good	Low	✅	Medium
Token Bucket	Good	Low	Controlled	Medium

Default choice: Sliding Window Counter. It’s the best balance of accuracy, memory, and simplicity.

HTTP Headers

Always return rate limit info in headers:

res.set({
  "X-RateLimit-Limit": limit.toString(),
  "X-RateLimit-Remaining": Math.max(0, remaining).toString(),
  "X-RateLimit-Reset": resetTime.toString(),
  "Retry-After": retryAfter.toString(), // Only on 429
});

if (!allowed) {
  return res.status(429).json({
    error: "Too Many Requests",
    retryAfter: retryAfterSeconds,
  });
}

Multi-Tier Rate Limiting

Production APIs often need multiple layers:

const rateLimiters = [
  { key: (req) => req.ip, limit: 1000, window: "1m" },          // Per IP
  { key: (req) => req.user?.id, limit: 100, window: "1m" },     // Per user
  { key: (req) => `${req.user?.id}:${req.path}`, limit: 20, window: "1m" }, // Per endpoint
];

The tightest limit wins. This prevents both global abuse and targeted endpoint hammering.