API Gateway & Rate Limiting

0/4 in this phase0/45 across the roadmap

📖 Concept

An API Gateway is the single entry point for all client requests in a distributed system. It sits between clients and your backend services, handling cross-cutting concerns like authentication, rate limiting, routing, and request transformation.

What an API Gateway Does

Function	Description
Routing	Routes requests to the correct backend service
Authentication	Validates tokens/API keys before forwarding
Rate Limiting	Prevents abuse and protects backends
Load Balancing	Distributes requests across service instances
SSL Termination	Handles HTTPS, forwards plain HTTP internally
Request Transform	Modifies headers, body, or URL before forwarding
Response Aggregation	Combines responses from multiple services
Caching	Caches responses to reduce backend load
Monitoring	Logs requests, tracks latency, generates metrics

Rate Limiting Algorithms

1. Token Bucket

Bucket holds tokens (max capacity = burst limit)
Tokens added at a fixed rate (e.g., 10/second)
Each request consumes 1 token
If bucket is empty, request is rejected
Pros: Allows bursts up to bucket capacity
Used by: AWS, Stripe

2. Sliding Window

Track request count in a rolling time window
More precise than fixed window (no boundary burst issue)
Pros: Smooth rate limiting, no boundary spikes
Used by: Kong, Cloudflare

3. Fixed Window Counter

Count requests in fixed time windows (e.g., per minute)
Reset counter at window boundary
Cons: Double-burst at window boundaries (59th second + 1st second)
Pros: Simple, memory-efficient

4. Leaky Bucket

Requests enter a queue (bucket)
Processed at a fixed rate (leak rate)
If bucket is full, new requests are rejected
Pros: Smooth output rate, no bursts
Used by: Network traffic shaping

Rate Limiting in Distributed Systems

The challenge: with multiple API Gateway instances behind a load balancer, each instance has its own counter. A user could send 100 requests each to 5 instances = 500 total while each instance thinks they only sent 100.

Solution: Use a centralized counter in Redis/Memcached that all instances share.

Trade-off: Centralized counting adds a Redis call per request (~0.5ms latency) but ensures accurate global rate limits.

💻 Code Example

codeTap to expand ⛶

1// ============================================
2// API Gateway & Rate Limiting — Implementation
3// ============================================
4
5// ---------- Token Bucket Rate Limiter ----------
6
7class TokenBucket {
8  constructor(capacity, refillRate) {
9    this.capacity = capacity;       // Max tokens (burst limit)
10    this.refillRate = refillRate;    // Tokens added per second
11    this.tokens = capacity;         // Start full
12    this.lastRefillTime = Date.now();
13  }
14
15  tryConsume(tokens = 1) {
16    this.refill();
17
18    if (this.tokens >= tokens) {
19      this.tokens -= tokens;
20      return {
21        allowed: true,
22        remainingTokens: Math.floor(this.tokens),
23        retryAfterMs: 0,
24      };
25    }
26
27    // Calculate when enough tokens will be available
28    const deficit = tokens - this.tokens;
29    const retryAfterMs = Math.ceil(deficit / this.refillRate * 1000);
30
31    return {
32      allowed: false,
33      remainingTokens: 0,
34      retryAfterMs,
35    };
36  }
37
38  refill() {
39    const now = Date.now();
40    const elapsed = (now - this.lastRefillTime) / 1000;
41    this.tokens = Math.min(
42      this.capacity,
43      this.tokens + elapsed * this.refillRate
44    );
45    this.lastRefillTime = now;
46  }
47}
48
49// ---------- Sliding Window Rate Limiter ----------
50
51class SlidingWindowRateLimiter {
52  constructor(windowSizeMs, maxRequests) {
53    this.windowSizeMs = windowSizeMs;
54    this.maxRequests = maxRequests;
55    this.requests = new Map(); // clientId → [timestamps]
56  }
57
58  isAllowed(clientId) {
59    const now = Date.now();
60    const windowStart = now - this.windowSizeMs;
61
62    // Get or create request log for this client
63    if (!this.requests.has(clientId)) {
64      this.requests.set(clientId, []);
65    }
66
67    const clientRequests = this.requests.get(clientId);
68
69    // Remove requests outside the window
70    const validRequests = clientRequests.filter(t => t > windowStart);
71    this.requests.set(clientId, validRequests);
72
73    if (validRequests.length >= this.maxRequests) {
74      const oldestInWindow = validRequests[0];
75      const retryAfterMs = oldestInWindow + this.windowSizeMs - now;
76
77      return {
78        allowed: false,
79        remaining: 0,
80        retryAfterMs: Math.ceil(retryAfterMs),
81      };
82    }
83
84    validRequests.push(now);
85    return {
86      allowed: true,
87      remaining: this.maxRequests - validRequests.length,
88      retryAfterMs: 0,
89    };
90  }
91}
92
93// ---------- Distributed Rate Limiter (Redis) ----------
94
95class RedisRateLimiter {
96  constructor(redis, windowSizeSeconds, maxRequests) {
97    this.redis = redis;
98    this.windowSize = windowSizeSeconds;
99    this.maxRequests = maxRequests;
100  }
101
102  async isAllowed(clientId) {
103    const key = `ratelimit:\${clientId}`;
104    const now = Date.now();
105
106    // Lua script for atomic check-and-increment
107    // This runs on the Redis server — no race conditions!
108    const luaScript = `
109      local key = KEYS[1]
110      local window = tonumber(ARGV[1])
111      local maxRequests = tonumber(ARGV[2])
112      local now = tonumber(ARGV[3])
113
114      -- Remove old entries outside the window
115      redis.call('ZREMRANGEBYSCORE', key, 0, now - window * 1000)
116
117      -- Count current entries
118      local count = redis.call('ZCARD', key)
119
120      if count < maxRequests then
121        -- Add this request
122        redis.call('ZADD', key, now, now .. '-' .. math.random(10000))
123        redis.call('EXPIRE', key, window)
124        return {1, maxRequests - count - 1}  -- allowed, remaining
125      else
126        return {0, 0}  -- denied, 0 remaining
127      end
128    `;
129
130    const result = await this.redis.eval(
131      luaScript, 1, key, this.windowSize, this.maxRequests, now
132    );
133
134    return {
135      allowed: result[0] === 1,
136      remaining: result[1],
137    };
138  }
139}
140
141// ---------- API Gateway Middleware ----------
142
143class APIGateway {
144  constructor() {
145    this.rateLimiters = {
146      free: new TokenBucket(100, 100 / 3600),        // 100 req/hour
147      pro: new TokenBucket(1000, 1000 / 3600),       // 1000 req/hour
148      enterprise: new TokenBucket(100000, 100000 / 3600), // effectively unlimited
149    };
150    this.routes = new Map();
151  }
152
153  registerRoute(path, service) {
154    this.routes.set(path, service);
155  }
156
157  async handleRequest(req) {
158    // 1. Authentication
159    const apiKey = req.headers['x-api-key'];
160    const client = await this.authenticateClient(apiKey);
161    if (!client) {
162      return { status: 401, body: { error: 'Invalid API key' } };
163    }
164
165    // 2. Rate Limiting
166    const limiter = this.rateLimiters[client.tier];
167    const result = limiter.tryConsume(1);
168    if (!result.allowed) {
169      return {
170        status: 429,
171        headers: {
172          'Retry-After': Math.ceil(result.retryAfterMs / 1000),
173          'X-RateLimit-Limit': limiter.capacity,
174          'X-RateLimit-Remaining': result.remainingTokens,
175        },
176        body: { error: 'Rate limit exceeded' },
177      };
178    }
179
180    // 3. Routing
181    const service = this.routes.get(req.path);
182    if (!service) {
183      return { status: 404, body: { error: 'Route not found' } };
184    }
185
186    // 4. Forward to backend service
187    const startTime = Date.now();
188    const response = await service.handle(req);
189    const latency = Date.now() - startTime;
190
191    // 5. Add headers and return
192    return {
193      ...response,
194      headers: {
195        ...response.headers,
196        'X-RateLimit-Remaining': result.remainingTokens,
197        'X-Response-Time': `\${latency}ms`,
198        'X-Request-ID': generateRequestId(),
199      },
200    };
201  }
202
203  async authenticateClient(apiKey) {
204    // In production: lookup in Redis/DB
205    const clients = {
206      'key_free_123': { id: 'client1', tier: 'free' },
207      'key_pro_456': { id: 'client2', tier: 'pro' },
208    };
209    return clients[apiKey] || null;
210  }
211}
212
213function generateRequestId() {
214  return 'req_' + Math.random().toString(36).substring(2, 15);
215}
216
217// Demo
218const bucket = new TokenBucket(10, 2); // 10 max, 2/sec refill
219console.log('Request 1:', bucket.tryConsume());
220console.log('Request 2:', bucket.tryConsume());
221
222const slider = new SlidingWindowRateLimiter(60000, 5); // 5 per minute
223console.log('Window check:', slider.isAllowed('user123'));

🏋️ Practice Exercise

Rate Limiter Comparison: Implement all four rate limiting algorithms (Token Bucket, Sliding Window, Fixed Window, Leaky Bucket) and compare their behavior when a client sends 20 requests in 1 second with a limit of 10/second.
Distributed Rate Limiting: Design a rate limiting system for an API with 10 gateway instances. How do you ensure the global limit of 1000 req/min per client is accurate across all instances?
Tiered Rate Limiting: Design a rate limiting system with per-endpoint limits: GET /users allows 1000/min, POST /users allows 50/min, POST /payments allows 10/min. How do you handle per-user AND per-endpoint limits simultaneously?
API Gateway Design: Design a full API gateway that handles: authentication (API key + JWT), rate limiting, routing to 5 microservices, response caching, and request logging. Draw the architecture diagram.
Graceful Degradation: Your API gateway detects that one backend service is slow (p99 > 5s). Design a circuit breaker pattern that automatically stops routing to the unhealthy service and returns cached/degraded responses.

⚠️ Common Mistakes

Implementing rate limiting per gateway instance instead of globally — with 5 gateway instances, per-instance limits of 100/min effectively allow 500/min globally. Always use a shared counter (Redis).
Using fixed window counters without considering boundary bursts — a client can send 100 requests at second 59 and 100 at second 61, getting 200 in 2 seconds while the 'per minute' limit is 100.
Not returning proper rate limit headers — clients need Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining to implement proper backoff. Without these, clients can't adjust their request rate.
Making the API gateway a single point of failure — always deploy multiple gateway instances behind a load balancer. Use health checks and auto-scaling.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for API Gateway & Rate Limiting

Was this topic helpful?

← PreviousGraphQL & gRPC Next →API Design Best Practices