API Gateway & Rate Limiting

0/4 in this phase0/45 across the roadmap

📖 Concept

An API Gateway is the single entry point for all client requests in a distributed system. It sits between clients and your backend services, handling cross-cutting concerns like authentication, rate limiting, routing, and request transformation.

What an API Gateway Does

Function Description
Routing Routes requests to the correct backend service
Authentication Validates tokens/API keys before forwarding
Rate Limiting Prevents abuse and protects backends
Load Balancing Distributes requests across service instances
SSL Termination Handles HTTPS, forwards plain HTTP internally
Request Transform Modifies headers, body, or URL before forwarding
Response Aggregation Combines responses from multiple services
Caching Caches responses to reduce backend load
Monitoring Logs requests, tracks latency, generates metrics

Rate Limiting Algorithms

1. Token Bucket

  • Bucket holds tokens (max capacity = burst limit)
  • Tokens added at a fixed rate (e.g., 10/second)
  • Each request consumes 1 token
  • If bucket is empty, request is rejected
  • Pros: Allows bursts up to bucket capacity
  • Used by: AWS, Stripe

2. Sliding Window

  • Track request count in a rolling time window
  • More precise than fixed window (no boundary burst issue)
  • Pros: Smooth rate limiting, no boundary spikes
  • Used by: Kong, Cloudflare

3. Fixed Window Counter

  • Count requests in fixed time windows (e.g., per minute)
  • Reset counter at window boundary
  • Cons: Double-burst at window boundaries (59th second + 1st second)
  • Pros: Simple, memory-efficient

4. Leaky Bucket

  • Requests enter a queue (bucket)
  • Processed at a fixed rate (leak rate)
  • If bucket is full, new requests are rejected
  • Pros: Smooth output rate, no bursts
  • Used by: Network traffic shaping

Rate Limiting in Distributed Systems

The challenge: with multiple API Gateway instances behind a load balancer, each instance has its own counter. A user could send 100 requests each to 5 instances = 500 total while each instance thinks they only sent 100.

Solution: Use a centralized counter in Redis/Memcached that all instances share.

Trade-off: Centralized counting adds a Redis call per request (~0.5ms latency) but ensures accurate global rate limits.

💻 Code Example

codeTap to expand ⛶
1// ============================================
2// API Gateway & Rate Limiting — Implementation
3// ============================================
4
5// ---------- Token Bucket Rate Limiter ----------
6
7class TokenBucket {
8 constructor(capacity, refillRate) {
9 this.capacity = capacity; // Max tokens (burst limit)
10 this.refillRate = refillRate; // Tokens added per second
11 this.tokens = capacity; // Start full
12 this.lastRefillTime = Date.now();
13 }
14
15 tryConsume(tokens = 1) {
16 this.refill();
17
18 if (this.tokens >= tokens) {
19 this.tokens -= tokens;
20 return {
21 allowed: true,
22 remainingTokens: Math.floor(this.tokens),
23 retryAfterMs: 0,
24 };
25 }
26
27 // Calculate when enough tokens will be available
28 const deficit = tokens - this.tokens;
29 const retryAfterMs = Math.ceil(deficit / this.refillRate * 1000);
30
31 return {
32 allowed: false,
33 remainingTokens: 0,
34 retryAfterMs,
35 };
36 }
37
38 refill() {
39 const now = Date.now();
40 const elapsed = (now - this.lastRefillTime) / 1000;
41 this.tokens = Math.min(
42 this.capacity,
43 this.tokens + elapsed * this.refillRate
44 );
45 this.lastRefillTime = now;
46 }
47}
48
49// ---------- Sliding Window Rate Limiter ----------
50
51class SlidingWindowRateLimiter {
52 constructor(windowSizeMs, maxRequests) {
53 this.windowSizeMs = windowSizeMs;
54 this.maxRequests = maxRequests;
55 this.requests = new Map(); // clientId → [timestamps]
56 }
57
58 isAllowed(clientId) {
59 const now = Date.now();
60 const windowStart = now - this.windowSizeMs;
61
62 // Get or create request log for this client
63 if (!this.requests.has(clientId)) {
64 this.requests.set(clientId, []);
65 }
66
67 const clientRequests = this.requests.get(clientId);
68
69 // Remove requests outside the window
70 const validRequests = clientRequests.filter(t => t > windowStart);
71 this.requests.set(clientId, validRequests);
72
73 if (validRequests.length >= this.maxRequests) {
74 const oldestInWindow = validRequests[0];
75 const retryAfterMs = oldestInWindow + this.windowSizeMs - now;
76
77 return {
78 allowed: false,
79 remaining: 0,
80 retryAfterMs: Math.ceil(retryAfterMs),
81 };
82 }
83
84 validRequests.push(now);
85 return {
86 allowed: true,
87 remaining: this.maxRequests - validRequests.length,
88 retryAfterMs: 0,
89 };
90 }
91}
92
93// ---------- Distributed Rate Limiter (Redis) ----------
94
95class RedisRateLimiter {
96 constructor(redis, windowSizeSeconds, maxRequests) {
97 this.redis = redis;
98 this.windowSize = windowSizeSeconds;
99 this.maxRequests = maxRequests;
100 }
101
102 async isAllowed(clientId) {
103 const key = `ratelimit:\${clientId}`;
104 const now = Date.now();
105
106 // Lua script for atomic check-and-increment
107 // This runs on the Redis server — no race conditions!
108 const luaScript = `
109 local key = KEYS[1]
110 local window = tonumber(ARGV[1])
111 local maxRequests = tonumber(ARGV[2])
112 local now = tonumber(ARGV[3])
113
114 -- Remove old entries outside the window
115 redis.call('ZREMRANGEBYSCORE', key, 0, now - window * 1000)
116
117 -- Count current entries
118 local count = redis.call('ZCARD', key)
119
120 if count < maxRequests then
121 -- Add this request
122 redis.call('ZADD', key, now, now .. '-' .. math.random(10000))
123 redis.call('EXPIRE', key, window)
124 return {1, maxRequests - count - 1} -- allowed, remaining
125 else
126 return {0, 0} -- denied, 0 remaining
127 end
128 `;
129
130 const result = await this.redis.eval(
131 luaScript, 1, key, this.windowSize, this.maxRequests, now
132 );
133
134 return {
135 allowed: result[0] === 1,
136 remaining: result[1],
137 };
138 }
139}
140
141// ---------- API Gateway Middleware ----------
142
143class APIGateway {
144 constructor() {
145 this.rateLimiters = {
146 free: new TokenBucket(100, 100 / 3600), // 100 req/hour
147 pro: new TokenBucket(1000, 1000 / 3600), // 1000 req/hour
148 enterprise: new TokenBucket(100000, 100000 / 3600), // effectively unlimited
149 };
150 this.routes = new Map();
151 }
152
153 registerRoute(path, service) {
154 this.routes.set(path, service);
155 }
156
157 async handleRequest(req) {
158 // 1. Authentication
159 const apiKey = req.headers['x-api-key'];
160 const client = await this.authenticateClient(apiKey);
161 if (!client) {
162 return { status: 401, body: { error: 'Invalid API key' } };
163 }
164
165 // 2. Rate Limiting
166 const limiter = this.rateLimiters[client.tier];
167 const result = limiter.tryConsume(1);
168 if (!result.allowed) {
169 return {
170 status: 429,
171 headers: {
172 'Retry-After': Math.ceil(result.retryAfterMs / 1000),
173 'X-RateLimit-Limit': limiter.capacity,
174 'X-RateLimit-Remaining': result.remainingTokens,
175 },
176 body: { error: 'Rate limit exceeded' },
177 };
178 }
179
180 // 3. Routing
181 const service = this.routes.get(req.path);
182 if (!service) {
183 return { status: 404, body: { error: 'Route not found' } };
184 }
185
186 // 4. Forward to backend service
187 const startTime = Date.now();
188 const response = await service.handle(req);
189 const latency = Date.now() - startTime;
190
191 // 5. Add headers and return
192 return {
193 ...response,
194 headers: {
195 ...response.headers,
196 'X-RateLimit-Remaining': result.remainingTokens,
197 'X-Response-Time': `\${latency}ms`,
198 'X-Request-ID': generateRequestId(),
199 },
200 };
201 }
202
203 async authenticateClient(apiKey) {
204 // In production: lookup in Redis/DB
205 const clients = {
206 'key_free_123': { id: 'client1', tier: 'free' },
207 'key_pro_456': { id: 'client2', tier: 'pro' },
208 };
209 return clients[apiKey] || null;
210 }
211}
212
213function generateRequestId() {
214 return 'req_' + Math.random().toString(36).substring(2, 15);
215}
216
217// Demo
218const bucket = new TokenBucket(10, 2); // 10 max, 2/sec refill
219console.log('Request 1:', bucket.tryConsume());
220console.log('Request 2:', bucket.tryConsume());
221
222const slider = new SlidingWindowRateLimiter(60000, 5); // 5 per minute
223console.log('Window check:', slider.isAllowed('user123'));

🏋️ Practice Exercise

  1. Rate Limiter Comparison: Implement all four rate limiting algorithms (Token Bucket, Sliding Window, Fixed Window, Leaky Bucket) and compare their behavior when a client sends 20 requests in 1 second with a limit of 10/second.

  2. Distributed Rate Limiting: Design a rate limiting system for an API with 10 gateway instances. How do you ensure the global limit of 1000 req/min per client is accurate across all instances?

  3. Tiered Rate Limiting: Design a rate limiting system with per-endpoint limits: GET /users allows 1000/min, POST /users allows 50/min, POST /payments allows 10/min. How do you handle per-user AND per-endpoint limits simultaneously?

  4. API Gateway Design: Design a full API gateway that handles: authentication (API key + JWT), rate limiting, routing to 5 microservices, response caching, and request logging. Draw the architecture diagram.

  5. Graceful Degradation: Your API gateway detects that one backend service is slow (p99 > 5s). Design a circuit breaker pattern that automatically stops routing to the unhealthy service and returns cached/degraded responses.

⚠️ Common Mistakes

  • Implementing rate limiting per gateway instance instead of globally — with 5 gateway instances, per-instance limits of 100/min effectively allow 500/min globally. Always use a shared counter (Redis).

  • Using fixed window counters without considering boundary bursts — a client can send 100 requests at second 59 and 100 at second 61, getting 200 in 2 seconds while the 'per minute' limit is 100.

  • Not returning proper rate limit headers — clients need Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining to implement proper backoff. Without these, clients can't adjust their request rate.

  • Making the API gateway a single point of failure — always deploy multiple gateway instances behind a load balancer. Use health checks and auto-scaling.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for API Gateway & Rate Limiting