DNS & Domain Resolution

0/4 in this phase0/45 across the roadmap

📖 Concept

DNS (Domain Name System) is the internet's phonebook — it translates human-readable domain names (like google.com) into IP addresses (like 142.250.80.46). Understanding DNS is critical for system design because it's involved in every single request your users make.

How DNS Resolution Works

When a user types www.example.com in their browser, here's what happens:

  1. Browser cache → checks if the domain was recently resolved
  2. OS cache → checks the local DNS resolver cache
  3. Recursive resolver (ISP's DNS server) → checks its cache
  4. Root nameserver → directs to the TLD nameserver for .com
  5. TLD nameserver → directs to the authoritative nameserver for example.com
  6. Authoritative nameserver → returns the actual IP address
  7. Result is cached at every level with a TTL (Time To Live)

DNS Record Types

Record Purpose Example
A Maps domain to IPv4 address example.com → 93.184.216.34
AAAA Maps domain to IPv6 address example.com → 2606:2800:220:1::248
CNAME Alias to another domain www.example.com → example.com
MX Mail server for the domain example.com → mail.example.com
NS Authoritative nameservers example.com → ns1.example.com
TXT Verification, SPF records v=spf1 include:_spf.google.com
SRV Service discovery (host + port) _sip._tcp.example.com

DNS in System Design

DNS is not just for name resolution — it's a load balancing and traffic routing tool:

  • Round-robin DNS: Return multiple A records; clients pick one randomly → basic load distribution
  • Geo-DNS: Return different IPs based on the client's geographic location → route users to the nearest datacenter
  • Weighted DNS: Return IPs with different weights → gradual traffic shifting (canary deployments)
  • Failover DNS: Health-check endpoints; if a server goes down, remove its IP from DNS responses

Key insight: DNS has a TTL cache, so changes don't propagate instantly. This means DNS failover can take minutes (based on TTL), making it unsuitable as the only failover mechanism. Always combine with health-check-based load balancers.

💻 Code Example

codeTap to expand ⛶
1// ============================================
2// DNS Concepts — Practical Demonstrations
3// ============================================
4
5// ---------- DNS Lookup Simulation ----------
6
7// Simulating the DNS resolution hierarchy
8class DNSResolver {
9 constructor() {
10 // Each cache level has different TTLs
11 this.browserCache = new Map(); // TTL: ~1 minute
12 this.osCache = new Map(); // TTL: ~5 minutes
13 this.recursiveCache = new Map(); // TTL: varies (30s to 24h)
14
15 // Authoritative records (the "truth")
16 this.authoritativeRecords = {
17 'example.com': {
18 A: ['93.184.216.34'],
19 AAAA: ['2606:2800:220:1:248:1893:25c8:1946'],
20 MX: ['mail.example.com'],
21 NS: ['ns1.example.com', 'ns2.example.com'],
22 TTL: 3600, // 1 hour
23 },
24 'api.example.com': {
25 // Round-robin: multiple A records for load distribution
26 A: ['10.0.1.1', '10.0.1.2', '10.0.1.3', '10.0.1.4'],
27 TTL: 60, // Short TTL for faster failover
28 },
29 };
30 }
31
32 resolve(domain, recordType = 'A') {
33 const cacheKey = `\${domain}:\${recordType}`;
34
35 // Level 1: Browser cache
36 if (this.browserCache.has(cacheKey)) {
37 console.log(` ✅ Browser cache HIT for \${domain}`);
38 return this.browserCache.get(cacheKey);
39 }
40
41 // Level 2: OS cache
42 if (this.osCache.has(cacheKey)) {
43 console.log(` ✅ OS cache HIT for \${domain}`);
44 const result = this.osCache.get(cacheKey);
45 this.browserCache.set(cacheKey, result);
46 return result;
47 }
48
49 // Level 3: Recursive resolver cache
50 if (this.recursiveCache.has(cacheKey)) {
51 console.log(` ✅ Recursive resolver cache HIT for \${domain}`);
52 const result = this.recursiveCache.get(cacheKey);
53 this.osCache.set(cacheKey, result);
54 this.browserCache.set(cacheKey, result);
55 return result;
56 }
57
58 // Cache MISS — full resolution needed
59 console.log(` ❌ Cache MISS — querying authoritative servers for \${domain}`);
60 const record = this.authoritativeRecords[domain];
61 if (!record || !record[recordType]) {
62 throw new Error(`NXDOMAIN: \${domain} not found`);
63 }
64
65 const result = record[recordType];
66 // Cache at all levels
67 this.recursiveCache.set(cacheKey, result);
68 this.osCache.set(cacheKey, result);
69 this.browserCache.set(cacheKey, result);
70 return result;
71 }
72}
73
74// ---------- Geo-DNS Load Balancing ----------
75
76class GeoDNS {
77 constructor() {
78 this.regionMap = {
79 'us-east': { ip: '10.1.1.1', datacenter: 'Virginia' },
80 'us-west': { ip: '10.2.1.1', datacenter: 'Oregon' },
81 'eu-west': { ip: '10.3.1.1', datacenter: 'Ireland' },
82 'ap-south': { ip: '10.4.1.1', datacenter: 'Mumbai' },
83 };
84 }
85
86 resolve(domain, clientRegion) {
87 const target = this.regionMap[clientRegion];
88 if (!target) {
89 // Fallback to nearest region
90 return this.regionMap['us-east'];
91 }
92 console.log(
93 `Routing \${domain} from \${clientRegion} → \${target.datacenter} (\${target.ip})`
94 );
95 return target;
96 }
97}
98
99// ---------- DNS-based Health Check & Failover ----------
100
101class HealthCheckDNS {
102 constructor(endpoints) {
103 this.endpoints = endpoints; // [{ ip, healthy }]
104 this.checkInterval = 30000; // Check every 30 seconds
105 }
106
107 getHealthyEndpoints() {
108 return this.endpoints.filter(ep => ep.healthy).map(ep => ep.ip);
109 }
110
111 resolve(domain) {
112 const healthy = this.getHealthyEndpoints();
113 if (healthy.length === 0) {
114 console.error('🚨 ALL endpoints are DOWN!');
115 return this.endpoints[0].ip; // Return first as last resort
116 }
117 // Return only healthy IPs
118 console.log(`Healthy servers for \${domain}: \${healthy.join(', ')}`);
119 return healthy;
120 }
121
122 // Simulate a server going down
123 markUnhealthy(ip) {
124 const endpoint = this.endpoints.find(ep => ep.ip === ip);
125 if (endpoint) {
126 endpoint.healthy = false;
127 console.log(`🔴 Server \${ip} marked UNHEALTHY`);
128 }
129 }
130}
131
132// ---------- Demo Usage ----------
133const resolver = new DNSResolver();
134console.log('First lookup (cache miss):');
135console.log('IP:', resolver.resolve('example.com'));
136console.log('\nSecond lookup (cache hit):');
137console.log('IP:', resolver.resolve('example.com'));
138
139const geoDns = new GeoDNS();
140geoDns.resolve('api.example.com', 'eu-west');
141geoDns.resolve('api.example.com', 'ap-south');

🏋️ Practice Exercise

  1. Trace the Resolution: Diagram the complete DNS resolution path for mail.google.com. Label each server (root, TLD, authoritative) and the record type queried at each step.

  2. TTL Impact Analysis: If api.example.com has a TTL of 300 seconds and you need to migrate to a new IP, what's the maximum time before all users hit the new server? How would you minimize downtime during migration?

  3. Geo-DNS Design: Design a Geo-DNS strategy for a service with datacenters in US-East, EU-West, and AP-Southeast. How do you handle users in Africa or South America where you don't have datacenters?

  4. DNS Failover Limitations: Explain why DNS alone isn't enough for instant failover. What would you combine it with for sub-second failover?

  5. Record Configuration: You're setting up DNS for a new SaaS product. Configure the following: (a) main website, (b) API subdomain, (c) email delivery via Google Workspace, (d) SSL certificate verification.

⚠️ Common Mistakes

  • Relying solely on DNS for failover — DNS has TTL-based caching, so even after you update records, clients may still use the old IP for minutes. Combine DNS with health-check-based load balancers for fast failover.

  • Setting TTLs too high — a 24-hour TTL means you can't redirect traffic for up to 24 hours during an incident. Use lower TTLs (60-300s) for services that need quick failover, at the cost of more DNS queries.

  • Forgetting that DNS responses can be cached by intermediate resolvers you don't control — even if you update your authoritative nameserver, ISP resolvers may serve stale data until their cache expires.

  • Not understanding that CNAME records can't coexist with other records at the zone apex — you can't have a CNAME for 'example.com' alongside an MX record. Use ALIAS or ANAME records instead.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for DNS & Domain Resolution