Khaled Zeitar

Posted on Dec 29

Advanced Rate Limiting Patterns: Multi-Tier, Compound Limiters, and Distributed Systems

#api #node #ratelimiting #typescript

You've got basic rate limiting working. Your API is protected. Life is good.

Then your product manager walks over: "We need different rate limits for free and paid users." Or your infrastructure team says "We're adding a second server, will your rate limiting still work?" Or you realize that your image processing endpoint needs way stricter limits than your read-only endpoints.

This is where things get interesting. Let's talk about advanced patterns that I've actually used in production.

Different limits for different users

The SaaS problem: free users get 100 requests per hour, paid users get 1000, enterprise gets basically unlimited. How do you handle this?

The naive approach is to check the user's tier and then use an if statement to pick different limits. That works until you have 5 tiers and 10 endpoints and now you have 50 different limit configurations scattered through your code.

Here's a cleaner way:

import { RateLimiterFactory, InMemoryStorage } from '@zeitar/throttle';

const storage = new InMemoryStorage();
const tiers = {
  free: new RateLimiterFactory({
    policy: 'token_bucket',
    id: 'free-tier',
    limit: 100,
    rate: { interval: '1 hour', amount: 100 }
  }, storage),
  pro: new RateLimiterFactory({
    policy: 'token_bucket',
    id: 'pro-tier',
    limit: 1000,
    rate: { interval: '1 hour', amount: 1000 }
  }, storage),
  enterprise: new RateLimiterFactory({
    policy: 'token_bucket',
    id: 'enterprise-tier',
    limit: 10000,
    rate: { interval: '1 hour', amount: 10000 }
  }, storage)
};

const multiTierRateLimit = async (req, res, next) => {
  const userTier = req.user?.tier || 'free';
  const factory = tiers[userTier];
  const limiter = factory.create(req.user.id);
  const result = await limiter.consume(1);

  res.setHeader('X-RateLimit-Tier', userTier);
  res.setHeader('X-RateLimit-Remaining', result.getRemainingTokens().toString());

  if (!result.isAccepted()) {
    return res.status(429).json({
      error: 'Rate limit exceeded',
      tier: userTier,
      retryAfter: result.getRetryAfter()
    });
  }

  next();
};

Here's a clever conversion tactic: when free users hit their limit, tell them they can upgrade. This is a natural conversion point:

if (!result.isAccepted() && userTier === 'free') {
  return res.status(429).json({
    error: 'Rate limit exceeded',
    message: 'Upgrade to Pro for 10x higher limits',
    upgradeUrl: 'https://yourapp.com/pricing',
    retryAfter: result.getRetryAfter()
  });
}

This can convert at 2–3%, which is better than most upsell prompts.

Enforcing multiple limits at once

A common production scenario: you need to prevent both burst attacks (someone hammering the API with 100 requests per second) AND sustained abuse (someone making 50,000 requests per day). A single rate limiter can't do both effectively.

The solution is compound limiters — checking multiple limits for every request:

import { CompoundRateLimiterFactory, RateLimiterFactory, InMemoryStorage } from '@zeitar/throttle';

const storage = new InMemoryStorage();

const perSecondFactory = new RateLimiterFactory({
  policy: 'token_bucket',
  id: 'per-second',
  limit: 10,
  rate: { interval: '1 second', amount: 10 }
}, storage);

const perHourFactory = new RateLimiterFactory({
  policy: 'token_bucket',
  id: 'per-hour',
  limit: 1000,
  rate: { interval: '1 hour', amount: 1000 }
}, storage);

const compound = new CompoundRateLimiterFactory([
  perSecondFactory,
  perHourFactory
]);

const limiter = compound.create('user-123');
const result = await limiter.consume(1);

The request only goes through if ALL limiters accept it. If any limiter rejects, the whole request is rejected.

This is powerful because you can layer protection. Here's a production-ready configuration:

const apiLimiter = new CompoundRateLimiterFactory([
  // Prevent flooding
  new RateLimiterFactory({
    policy: 'token_bucket',
    id: 'burst',
    limit: 20,
    rate: { interval: '1 second', amount: 20 }
  }, storage),

  // Prevent sustained abuse
  new RateLimiterFactory({
    policy: 'token_bucket',
    id: 'sustained',
    limit: 1000,
    rate: { interval: '1 minute', amount: 100 }
  }, storage),

  // Daily quota
  new RateLimiterFactory({
    policy: 'fixed_window',
    id: 'daily',
    limit: 10000,
    interval: '1 day'
  }, storage)
]);

Three layers of protection:

Can't exceed 20 requests per second
Can't sustain more than 100 requests per minute (1000 tokens that refill at 100/minute)
Daily cap of 10,000 total

Someone trying to abuse your API has to get through all three.

Global limits plus per-user limits

Sometimes you need to protect the entire system AND individual users. Example: your API calls an expensive third-party service that has its own rate limits. You need a global limit to stay under their cap, plus per-user limits for fairness.

const globalFactory = new RateLimiterFactory({
  policy: 'token_bucket',
  id: 'global',
  limit: 10000,
  rate: { interval: '1 second', amount: 10000 }
}, storage);

const perUserFactory = new RateLimiterFactory({
  policy: 'token_bucket',
  id: 'per-user',
  limit: 100,
  rate: { interval: '1 minute', amount: 100 }
}, storage);

const rateLimitMiddleware = async (req, res, next) => {
  const userId = req.user?.id || req.ip;

  // Check global limit first (same identifier for everyone)
  const globalLimiter = globalFactory.create('api');
  const globalResult = await globalLimiter.consume(1);

  if (!globalResult.isAccepted()) {
    return res.status(503).json({
      error: 'Service temporarily unavailable',
      message: 'API is experiencing high traffic',
      retryAfter: globalResult.getRetryAfter()
    });
  }

  // Then check per-user limit
  const userLimiter = perUserFactory.create(userId);
  const userResult = await userLimiter.consume(1);

  if (!userResult.isAccepted()) {
    return res.status(429).json({
      error: 'Rate limit exceeded',
      retryAfter: userResult.getRetryAfter()
    });
  }

  next();
};

Note the different status codes: 503 for global (system issue), 429 for per-user (your issue). This helps clients understand what's happening.

Different costs for different endpoints

Not all endpoints are created equal. Reading a user profile is cheap. Generating a PDF report with 10,000 rows is expensive. You can reflect this in your rate limiting:

const factory = new RateLimiterFactory({
  policy: 'token_bucket',
  id: 'api',
  limit: 1000,
  rate: { interval: '1 hour', amount: 1000 }
}, storage);

// Simple read: 1 token
app.get('/api/user/:id', async (req, res) => {
  const limiter = factory.create(req.user.id);
  const result = await limiter.consume(1);
  // handle result...
});

// Complex search: 10 tokens
app.post('/api/search', async (req, res) => {
  const limiter = factory.create(req.user.id);
  const result = await limiter.consume(10);
  // handle result...
});

// Heavy report: 100 tokens
app.post('/api/reports', async (req, res) => {
  const limiter = factory.create(req.user.id);
  const result = await limiter.consume(100);
  // handle result...
});

Now users have 1000 tokens to spend however they want. They can make 1000 simple requests, or 10 reports, or any combination. This is way more flexible than separate limits per endpoint.

The reservation pattern

Here's a powerful pattern that's often overlooked: reservations. Say you have long-running operations like video processing, large file uploads, or batch jobs. You want to rate limit how many jobs a user can start, but you don't want to reject them immediately if they're at their limit — you want to queue them and wait for capacity.

const limiter = factory.create(req.user.id);

// Reserve tokens, wait up to 30 seconds for availability
try {
  const reservation = await limiter.reserve(1, 30);
  const rateLimit = reservation.getRateLimit();

  if (!rateLimit.isAccepted()) {
    return res.status(429).json({
      error: 'Rate limit exceeded',
      message: 'Too many concurrent jobs',
      retryAfter: rateLimit.getRetryAfter()
    });
  }

  // Wait for the reservation (blocks until tokens are available or timeout)
  await reservation.wait();

  // Now we're guaranteed to have the tokens
  res.json({ message: 'Job started', jobId: '...' });
  startJobProcessing(req.body);

} catch (error) {
  // Handle MaxWaitDurationExceededError
  return res.status(429).json({
    error: 'Rate limit exceeded',
    message: 'Unable to acquire tokens within wait time'
  });
}

The key difference between consume() and reserve():

consume(): Either gets tokens immediately or fails—perfect for real-time API requests
reserve(): Will wait (up to your timeout) for tokens to become available—ideal for job queues and async operations

This is perfect for smoothing out traffic spikes. Instead of rejecting users during peak times, you queue them up and process requests as capacity becomes available. Your users get a better experience (queued instead of rejected), and you get better resource utilization.

Scaling to multiple servers

Once you have multiple application servers, in-memory storage stops working. Each server has its own memory, so the limits aren't shared. User makes 100 requests to server A, then 100 to server B — they've just bypassed your 100-request limit.

You need shared storage. Redis is the usual choice:

import { StorageInterface, LimiterStateInterface } from '@zeitar/throttle';
import { createClient, RedisClientType } from 'redis';

export class RedisStorage implements StorageInterface {
  private client: RedisClientType;

  constructor(client: RedisClientType) {
    this.client = client;
  }

  async save(state: LimiterStateInterface): Promise<void> {
    const key = state.getId();
    const expirationTime = state.getExpirationTime();
    const serialized = JSON.stringify(state.toJSON());

    if (expirationTime) {
      const now = Math.floor(Date.now() / 1000);
      const ttl = Math.max(1, expirationTime - now);
      await this.client.set(key, serialized, { EX: ttl });
    } else {
      await this.client.set(key, serialized);
    }
  }

  async fetch(id: string): Promise<LimiterStateInterface | null> {
    const data = await this.client.get(id);
    if (!data) return null;

    // Note: The actual implementation would need to deserialize back to
    // the appropriate state class (TokenBucket, Window, etc.)
    // This requires storing type information alongside the state
    const parsed = JSON.parse(data);

    // For a complete implementation, you'd need to reconstruct the proper
    // state object based on stored type information
    return parsed as LimiterStateInterface;
  }

  async delete(id: string): Promise<void> {
    await this.client.del(id);
  }
}

// Usage
const redisClient = createClient({ url: 'redis://localhost:6379' });
await redisClient.connect();

const factory = new RateLimiterFactory(
  {
    policy: 'token_bucket',
    id: 'api',
    limit: 1000,
    rate: { interval: '1 hour', amount: 1000 }
  },
  new RedisStorage(redisClient)
);

Now all your servers share the same rate limit state. Problem solved.

Adding distributed locking

There's one more thing you need for multi-server deployments: locking. Without it, you can have race conditions where two servers read the same state simultaneously, both think they have tokens available, both decrement, and now you've issued more tokens than you should have.

import { LockInterface } from '@zeitar/throttle';

export class RedisLock implements LockInterface {
  private client: RedisClientType;
  private lockTimeout: number;

  constructor(client: RedisClientType, lockTimeout = 5000) {
    this.client = client;
    this.lockTimeout = lockTimeout;
  }

  async acquire(key: string): Promise<boolean> {
    const lockKey = `lock:${key}`;
    const acquired = await this.client.set(lockKey, '1', {
      NX: true,  // Only set if not exists
      PX: this.lockTimeout  // Milliseconds
    });
    return acquired !== null;
  }

  async release(key: string): Promise<void> {
    await this.client.del(`lock:${key}`);
  }
}

const factory = new RateLimiterFactory(
  config,
  new RedisStorage(redisClient),
  new RedisLock(redisClient)
);

Now your rate limiting works correctly across any number of servers.

Dynamic limits based on user behavior

Here's something I implemented for a client: adjust limits based on how trustworthy a user is.

const createDynamicLimiter = (user: User) => {
  // New accounts get strict limits
  // Accounts older than 90 days with no violations get higher limits
  const isTrusted = user.accountAge > 90 && user.violationCount === 0;

  const limit = isTrusted ? 5000 : 1000;

  return new RateLimiterFactory({
    policy: 'token_bucket',
    id: 'dynamic',
    limit,
    rate: { interval: '1 hour', amount: limit }
  }, storage).create(user.id);
};

app.use(async (req, res, next) => {
  const limiter = createDynamicLimiter(req.user);
  const result = await limiter.consume(1);
  // check result...
});

You could also adjust based on:

Payment tier (paid users get higher limits automatically)
API key vs OAuth (different trust levels)
Historical behavior (penalize users who frequently hit limits)
Time of day (be more lenient during off-peak hours)

Putting it all together

Here's a realistic production setup that combines several patterns:

import { RateLimiterFactory, CompoundRateLimiterFactory } from '@zeitar/throttle';
import { RedisStorage, RedisLock } from './redis-impl';
import { createClient } from 'redis';

const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();

const storage = new RedisStorage(redisClient);
const lock = new RedisLock(redisClient);

// Define tier configurations
const createTierLimiter = (tier: 'free' | 'pro' | 'enterprise') => {
  const configs = {
    free: { burst: 10, sustained: 100, daily: 1000 },
    pro: { burst: 50, sustained: 1000, daily: 50000 },
    enterprise: { burst: 200, sustained: 10000, daily: 1000000 }
  };

  const config = configs[tier];

  return new CompoundRateLimiterFactory([
    new RateLimiterFactory({
      policy: 'token_bucket',
      id: `${tier}-burst`,
      limit: config.burst,
      rate: { interval: '1 second', amount: config.burst }
    }, storage, lock),

    new RateLimiterFactory({
      policy: 'token_bucket',
      id: `${tier}-sustained`,
      limit: config.sustained,
      rate: { interval: '1 minute', amount: config.sustained }
    }, storage, lock),

    new RateLimiterFactory({
      policy: 'fixed_window',
      id: `${tier}-daily`,
      limit: config.daily,
      interval: '1 day'
    }, storage, lock)
  ]);
};

const tierLimiters = {
  free: createTierLimiter('free'),
  pro: createTierLimiter('pro'),
  enterprise: createTierLimiter('enterprise')
};

app.use(async (req, res, next) => {
  const tier = req.user?.tier || 'free';
  const compound = tierLimiters[tier];
  const limiter = compound.create(req.user?.id || req.ip);

  try {
    const result = await limiter.consume(1);

    res.setHeader('X-RateLimit-Tier', tier);
    res.setHeader('X-RateLimit-Remaining', result.getRemainingTokens().toString());

    if (!result.isAccepted()) {
      return res.status(429).json({
        error: 'Rate limit exceeded',
        tier,
        retryAfter: result.getRetryAfter()
      });
    }

    next();
  } catch (error) {
    console.error('Rate limiter error:', error);
    next(); // Fail open
  }
});

This gives you:

Three tiers with different limits
Burst protection (per-second limit)
Sustained protection (per-minute limit)
Daily quotas
Works across multiple servers (Redis)
Thread-safe (distributed locking)
Fails gracefully if Redis goes down

Performance notes

Some real numbers from production:

In-memory storage: Sub-millisecond latency, about 100 bytes per active limiter. With 1 million active users, you're looking at ~100MB of memory. Very fast, but doesn't work with multiple servers.

Redis (local): 1–5ms latency. You're adding a network hop to every request, but it's usually on the same local network so it's fast.

Redis (cloud): 5–20ms latency depending on distance. Still acceptable for most APIs, but something to keep in mind if you're optimizing for the last bit of performance.

Locking overhead: Adds 1–3ms. Worth it for correctness in multi-server deployments.

All the algorithms are O(1) — constant time regardless of request rate. They don't slow down as you scale.

When things go wrong

Redis goes down: Your rate limiter will throw errors. This is why the "fail open" pattern is critical — log the error but let requests through. Better to temporarily lose rate limiting than to block all traffic.

Lock timeout: If a server crashes while holding a lock, the lock will expire after your timeout (default 5 seconds). New requests will wait briefly then proceed. Not perfect, but handles the case.

Misconfigured limits: Setting the daily limit to 100 instead of 10,000 is easier than you think. You'll figure it out quickly if your monitoring shows a spike in 429 responses. This is why you need monitoring.

The path forward

Start simple:

Basic rate limiting with Token Bucket and in-memory storage
Add monitoring to see actual usage patterns
Adjust limits based on data
Add tier-based limits when you have paid plans
Move to Redis when you add a second server
Add compound limits if you're seeing specific attack patterns

Don't build the complex version on day one. You won't know what limits make sense until you have real traffic.

Open source contribution

Everything here is built on https://github.com/kzeitar/throttle. If you want to:

Add a new storage backend (PostgreSQL, DynamoDB, whatever)
Implement a new algorithm
Improve the existing code
Add examples for your favorite framework

The codebase is designed to be extended. It's TypeScript, well-tested, and the architecture is straightforward. Pull requests welcome.

Links:

GitHub: https://github.com/kzeitar/throttle
Docs: https://github.com/kzeitar/throttle#documentation
Custom storage guide: https://github.com/kzeitar/throttle/blob/main/docs/custom-storage.md

That's it. You now know more about rate limiting than most backend engineers. Go build something that doesn't fall over when it gets popular.

DEV Community