API Gateway & Rate Limiting
📖 Concept
An API Gateway is the single entry point for all client requests in a distributed system. It sits between clients and your backend services, handling cross-cutting concerns like authentication, rate limiting, routing, and request transformation.
What an API Gateway Does
| Function | Description |
|---|---|
| Routing | Routes requests to the correct backend service |
| Authentication | Validates tokens/API keys before forwarding |
| Rate Limiting | Prevents abuse and protects backends |
| Load Balancing | Distributes requests across service instances |
| SSL Termination | Handles HTTPS, forwards plain HTTP internally |
| Request Transform | Modifies headers, body, or URL before forwarding |
| Response Aggregation | Combines responses from multiple services |
| Caching | Caches responses to reduce backend load |
| Monitoring | Logs requests, tracks latency, generates metrics |
Rate Limiting Algorithms
1. Token Bucket
- Bucket holds tokens (max capacity = burst limit)
- Tokens added at a fixed rate (e.g., 10/second)
- Each request consumes 1 token
- If bucket is empty, request is rejected
- Pros: Allows bursts up to bucket capacity
- Used by: AWS, Stripe
2. Sliding Window
- Track request count in a rolling time window
- More precise than fixed window (no boundary burst issue)
- Pros: Smooth rate limiting, no boundary spikes
- Used by: Kong, Cloudflare
3. Fixed Window Counter
- Count requests in fixed time windows (e.g., per minute)
- Reset counter at window boundary
- Cons: Double-burst at window boundaries (59th second + 1st second)
- Pros: Simple, memory-efficient
4. Leaky Bucket
- Requests enter a queue (bucket)
- Processed at a fixed rate (leak rate)
- If bucket is full, new requests are rejected
- Pros: Smooth output rate, no bursts
- Used by: Network traffic shaping
Rate Limiting in Distributed Systems
The challenge: with multiple API Gateway instances behind a load balancer, each instance has its own counter. A user could send 100 requests each to 5 instances = 500 total while each instance thinks they only sent 100.
Solution: Use a centralized counter in Redis/Memcached that all instances share.
Trade-off: Centralized counting adds a Redis call per request (~0.5ms latency) but ensures accurate global rate limits.
💻 Code Example
1// ============================================2// API Gateway & Rate Limiting — Implementation3// ============================================45// ---------- Token Bucket Rate Limiter ----------67class TokenBucket {8 constructor(capacity, refillRate) {9 this.capacity = capacity; // Max tokens (burst limit)10 this.refillRate = refillRate; // Tokens added per second11 this.tokens = capacity; // Start full12 this.lastRefillTime = Date.now();13 }1415 tryConsume(tokens = 1) {16 this.refill();1718 if (this.tokens >= tokens) {19 this.tokens -= tokens;20 return {21 allowed: true,22 remainingTokens: Math.floor(this.tokens),23 retryAfterMs: 0,24 };25 }2627 // Calculate when enough tokens will be available28 const deficit = tokens - this.tokens;29 const retryAfterMs = Math.ceil(deficit / this.refillRate * 1000);3031 return {32 allowed: false,33 remainingTokens: 0,34 retryAfterMs,35 };36 }3738 refill() {39 const now = Date.now();40 const elapsed = (now - this.lastRefillTime) / 1000;41 this.tokens = Math.min(42 this.capacity,43 this.tokens + elapsed * this.refillRate44 );45 this.lastRefillTime = now;46 }47}4849// ---------- Sliding Window Rate Limiter ----------5051class SlidingWindowRateLimiter {52 constructor(windowSizeMs, maxRequests) {53 this.windowSizeMs = windowSizeMs;54 this.maxRequests = maxRequests;55 this.requests = new Map(); // clientId → [timestamps]56 }5758 isAllowed(clientId) {59 const now = Date.now();60 const windowStart = now - this.windowSizeMs;6162 // Get or create request log for this client63 if (!this.requests.has(clientId)) {64 this.requests.set(clientId, []);65 }6667 const clientRequests = this.requests.get(clientId);6869 // Remove requests outside the window70 const validRequests = clientRequests.filter(t => t > windowStart);71 this.requests.set(clientId, validRequests);7273 if (validRequests.length >= this.maxRequests) {74 const oldestInWindow = validRequests[0];75 const retryAfterMs = oldestInWindow + this.windowSizeMs - now;7677 return {78 allowed: false,79 remaining: 0,80 retryAfterMs: Math.ceil(retryAfterMs),81 };82 }8384 validRequests.push(now);85 return {86 allowed: true,87 remaining: this.maxRequests - validRequests.length,88 retryAfterMs: 0,89 };90 }91}9293// ---------- Distributed Rate Limiter (Redis) ----------9495class RedisRateLimiter {96 constructor(redis, windowSizeSeconds, maxRequests) {97 this.redis = redis;98 this.windowSize = windowSizeSeconds;99 this.maxRequests = maxRequests;100 }101102 async isAllowed(clientId) {103 const key = `ratelimit:\${clientId}`;104 const now = Date.now();105106 // Lua script for atomic check-and-increment107 // This runs on the Redis server — no race conditions!108 const luaScript = `109 local key = KEYS[1]110 local window = tonumber(ARGV[1])111 local maxRequests = tonumber(ARGV[2])112 local now = tonumber(ARGV[3])113114 -- Remove old entries outside the window115 redis.call('ZREMRANGEBYSCORE', key, 0, now - window * 1000)116117 -- Count current entries118 local count = redis.call('ZCARD', key)119120 if count < maxRequests then121 -- Add this request122 redis.call('ZADD', key, now, now .. '-' .. math.random(10000))123 redis.call('EXPIRE', key, window)124 return {1, maxRequests - count - 1} -- allowed, remaining125 else126 return {0, 0} -- denied, 0 remaining127 end128 `;129130 const result = await this.redis.eval(131 luaScript, 1, key, this.windowSize, this.maxRequests, now132 );133134 return {135 allowed: result[0] === 1,136 remaining: result[1],137 };138 }139}140141// ---------- API Gateway Middleware ----------142143class APIGateway {144 constructor() {145 this.rateLimiters = {146 free: new TokenBucket(100, 100 / 3600), // 100 req/hour147 pro: new TokenBucket(1000, 1000 / 3600), // 1000 req/hour148 enterprise: new TokenBucket(100000, 100000 / 3600), // effectively unlimited149 };150 this.routes = new Map();151 }152153 registerRoute(path, service) {154 this.routes.set(path, service);155 }156157 async handleRequest(req) {158 // 1. Authentication159 const apiKey = req.headers['x-api-key'];160 const client = await this.authenticateClient(apiKey);161 if (!client) {162 return { status: 401, body: { error: 'Invalid API key' } };163 }164165 // 2. Rate Limiting166 const limiter = this.rateLimiters[client.tier];167 const result = limiter.tryConsume(1);168 if (!result.allowed) {169 return {170 status: 429,171 headers: {172 'Retry-After': Math.ceil(result.retryAfterMs / 1000),173 'X-RateLimit-Limit': limiter.capacity,174 'X-RateLimit-Remaining': result.remainingTokens,175 },176 body: { error: 'Rate limit exceeded' },177 };178 }179180 // 3. Routing181 const service = this.routes.get(req.path);182 if (!service) {183 return { status: 404, body: { error: 'Route not found' } };184 }185186 // 4. Forward to backend service187 const startTime = Date.now();188 const response = await service.handle(req);189 const latency = Date.now() - startTime;190191 // 5. Add headers and return192 return {193 ...response,194 headers: {195 ...response.headers,196 'X-RateLimit-Remaining': result.remainingTokens,197 'X-Response-Time': `\${latency}ms`,198 'X-Request-ID': generateRequestId(),199 },200 };201 }202203 async authenticateClient(apiKey) {204 // In production: lookup in Redis/DB205 const clients = {206 'key_free_123': { id: 'client1', tier: 'free' },207 'key_pro_456': { id: 'client2', tier: 'pro' },208 };209 return clients[apiKey] || null;210 }211}212213function generateRequestId() {214 return 'req_' + Math.random().toString(36).substring(2, 15);215}216217// Demo218const bucket = new TokenBucket(10, 2); // 10 max, 2/sec refill219console.log('Request 1:', bucket.tryConsume());220console.log('Request 2:', bucket.tryConsume());221222const slider = new SlidingWindowRateLimiter(60000, 5); // 5 per minute223console.log('Window check:', slider.isAllowed('user123'));
🏋️ Practice Exercise
Rate Limiter Comparison: Implement all four rate limiting algorithms (Token Bucket, Sliding Window, Fixed Window, Leaky Bucket) and compare their behavior when a client sends 20 requests in 1 second with a limit of 10/second.
Distributed Rate Limiting: Design a rate limiting system for an API with 10 gateway instances. How do you ensure the global limit of 1000 req/min per client is accurate across all instances?
Tiered Rate Limiting: Design a rate limiting system with per-endpoint limits:
GET /usersallows 1000/min,POST /usersallows 50/min,POST /paymentsallows 10/min. How do you handle per-user AND per-endpoint limits simultaneously?API Gateway Design: Design a full API gateway that handles: authentication (API key + JWT), rate limiting, routing to 5 microservices, response caching, and request logging. Draw the architecture diagram.
Graceful Degradation: Your API gateway detects that one backend service is slow (p99 > 5s). Design a circuit breaker pattern that automatically stops routing to the unhealthy service and returns cached/degraded responses.
⚠️ Common Mistakes
Implementing rate limiting per gateway instance instead of globally — with 5 gateway instances, per-instance limits of 100/min effectively allow 500/min globally. Always use a shared counter (Redis).
Using fixed window counters without considering boundary bursts — a client can send 100 requests at second 59 and 100 at second 61, getting 200 in 2 seconds while the 'per minute' limit is 100.
Not returning proper rate limit headers — clients need Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining to implement proper backoff. Without these, clients can't adjust their request rate.
Making the API gateway a single point of failure — always deploy multiple gateway instances behind a load balancer. Use health checks and auto-scaling.
💼 Interview Questions
🎤 Mock Interview
Practice a live interview for API Gateway & Rate Limiting