DNS & Domain Resolution
📖 Concept
DNS (Domain Name System) is the internet's phonebook — it translates human-readable domain names (like google.com) into IP addresses (like 142.250.80.46). Understanding DNS is critical for system design because it's involved in every single request your users make.
How DNS Resolution Works
When a user types www.example.com in their browser, here's what happens:
- Browser cache → checks if the domain was recently resolved
- OS cache → checks the local DNS resolver cache
- Recursive resolver (ISP's DNS server) → checks its cache
- Root nameserver → directs to the TLD nameserver for
.com - TLD nameserver → directs to the authoritative nameserver for
example.com - Authoritative nameserver → returns the actual IP address
- Result is cached at every level with a TTL (Time To Live)
DNS Record Types
| Record | Purpose | Example |
|---|---|---|
| A | Maps domain to IPv4 address | example.com → 93.184.216.34 |
| AAAA | Maps domain to IPv6 address | example.com → 2606:2800:220:1::248 |
| CNAME | Alias to another domain | www.example.com → example.com |
| MX | Mail server for the domain | example.com → mail.example.com |
| NS | Authoritative nameservers | example.com → ns1.example.com |
| TXT | Verification, SPF records | v=spf1 include:_spf.google.com |
| SRV | Service discovery (host + port) | _sip._tcp.example.com |
DNS in System Design
DNS is not just for name resolution — it's a load balancing and traffic routing tool:
- Round-robin DNS: Return multiple A records; clients pick one randomly → basic load distribution
- Geo-DNS: Return different IPs based on the client's geographic location → route users to the nearest datacenter
- Weighted DNS: Return IPs with different weights → gradual traffic shifting (canary deployments)
- Failover DNS: Health-check endpoints; if a server goes down, remove its IP from DNS responses
Key insight: DNS has a TTL cache, so changes don't propagate instantly. This means DNS failover can take minutes (based on TTL), making it unsuitable as the only failover mechanism. Always combine with health-check-based load balancers.
💻 Code Example
1// ============================================2// DNS Concepts — Practical Demonstrations3// ============================================45// ---------- DNS Lookup Simulation ----------67// Simulating the DNS resolution hierarchy8class DNSResolver {9 constructor() {10 // Each cache level has different TTLs11 this.browserCache = new Map(); // TTL: ~1 minute12 this.osCache = new Map(); // TTL: ~5 minutes13 this.recursiveCache = new Map(); // TTL: varies (30s to 24h)1415 // Authoritative records (the "truth")16 this.authoritativeRecords = {17 'example.com': {18 A: ['93.184.216.34'],19 AAAA: ['2606:2800:220:1:248:1893:25c8:1946'],20 MX: ['mail.example.com'],21 NS: ['ns1.example.com', 'ns2.example.com'],22 TTL: 3600, // 1 hour23 },24 'api.example.com': {25 // Round-robin: multiple A records for load distribution26 A: ['10.0.1.1', '10.0.1.2', '10.0.1.3', '10.0.1.4'],27 TTL: 60, // Short TTL for faster failover28 },29 };30 }3132 resolve(domain, recordType = 'A') {33 const cacheKey = `\${domain}:\${recordType}`;3435 // Level 1: Browser cache36 if (this.browserCache.has(cacheKey)) {37 console.log(` ✅ Browser cache HIT for \${domain}`);38 return this.browserCache.get(cacheKey);39 }4041 // Level 2: OS cache42 if (this.osCache.has(cacheKey)) {43 console.log(` ✅ OS cache HIT for \${domain}`);44 const result = this.osCache.get(cacheKey);45 this.browserCache.set(cacheKey, result);46 return result;47 }4849 // Level 3: Recursive resolver cache50 if (this.recursiveCache.has(cacheKey)) {51 console.log(` ✅ Recursive resolver cache HIT for \${domain}`);52 const result = this.recursiveCache.get(cacheKey);53 this.osCache.set(cacheKey, result);54 this.browserCache.set(cacheKey, result);55 return result;56 }5758 // Cache MISS — full resolution needed59 console.log(` ❌ Cache MISS — querying authoritative servers for \${domain}`);60 const record = this.authoritativeRecords[domain];61 if (!record || !record[recordType]) {62 throw new Error(`NXDOMAIN: \${domain} not found`);63 }6465 const result = record[recordType];66 // Cache at all levels67 this.recursiveCache.set(cacheKey, result);68 this.osCache.set(cacheKey, result);69 this.browserCache.set(cacheKey, result);70 return result;71 }72}7374// ---------- Geo-DNS Load Balancing ----------7576class GeoDNS {77 constructor() {78 this.regionMap = {79 'us-east': { ip: '10.1.1.1', datacenter: 'Virginia' },80 'us-west': { ip: '10.2.1.1', datacenter: 'Oregon' },81 'eu-west': { ip: '10.3.1.1', datacenter: 'Ireland' },82 'ap-south': { ip: '10.4.1.1', datacenter: 'Mumbai' },83 };84 }8586 resolve(domain, clientRegion) {87 const target = this.regionMap[clientRegion];88 if (!target) {89 // Fallback to nearest region90 return this.regionMap['us-east'];91 }92 console.log(93 `Routing \${domain} from \${clientRegion} → \${target.datacenter} (\${target.ip})`94 );95 return target;96 }97}9899// ---------- DNS-based Health Check & Failover ----------100101class HealthCheckDNS {102 constructor(endpoints) {103 this.endpoints = endpoints; // [{ ip, healthy }]104 this.checkInterval = 30000; // Check every 30 seconds105 }106107 getHealthyEndpoints() {108 return this.endpoints.filter(ep => ep.healthy).map(ep => ep.ip);109 }110111 resolve(domain) {112 const healthy = this.getHealthyEndpoints();113 if (healthy.length === 0) {114 console.error('🚨 ALL endpoints are DOWN!');115 return this.endpoints[0].ip; // Return first as last resort116 }117 // Return only healthy IPs118 console.log(`Healthy servers for \${domain}: \${healthy.join(', ')}`);119 return healthy;120 }121122 // Simulate a server going down123 markUnhealthy(ip) {124 const endpoint = this.endpoints.find(ep => ep.ip === ip);125 if (endpoint) {126 endpoint.healthy = false;127 console.log(`🔴 Server \${ip} marked UNHEALTHY`);128 }129 }130}131132// ---------- Demo Usage ----------133const resolver = new DNSResolver();134console.log('First lookup (cache miss):');135console.log('IP:', resolver.resolve('example.com'));136console.log('\nSecond lookup (cache hit):');137console.log('IP:', resolver.resolve('example.com'));138139const geoDns = new GeoDNS();140geoDns.resolve('api.example.com', 'eu-west');141geoDns.resolve('api.example.com', 'ap-south');
🏋️ Practice Exercise
Trace the Resolution: Diagram the complete DNS resolution path for
mail.google.com. Label each server (root, TLD, authoritative) and the record type queried at each step.TTL Impact Analysis: If
api.example.comhas a TTL of 300 seconds and you need to migrate to a new IP, what's the maximum time before all users hit the new server? How would you minimize downtime during migration?Geo-DNS Design: Design a Geo-DNS strategy for a service with datacenters in US-East, EU-West, and AP-Southeast. How do you handle users in Africa or South America where you don't have datacenters?
DNS Failover Limitations: Explain why DNS alone isn't enough for instant failover. What would you combine it with for sub-second failover?
Record Configuration: You're setting up DNS for a new SaaS product. Configure the following: (a) main website, (b) API subdomain, (c) email delivery via Google Workspace, (d) SSL certificate verification.
⚠️ Common Mistakes
Relying solely on DNS for failover — DNS has TTL-based caching, so even after you update records, clients may still use the old IP for minutes. Combine DNS with health-check-based load balancers for fast failover.
Setting TTLs too high — a 24-hour TTL means you can't redirect traffic for up to 24 hours during an incident. Use lower TTLs (60-300s) for services that need quick failover, at the cost of more DNS queries.
Forgetting that DNS responses can be cached by intermediate resolvers you don't control — even if you update your authoritative nameserver, ISP resolvers may serve stale data until their cache expires.
Not understanding that CNAME records can't coexist with other records at the zone apex — you can't have a CNAME for 'example.com' alongside an MX record. Use ALIAS or ANAME records instead.
💼 Interview Questions
🎤 Mock Interview
Practice a live interview for DNS & Domain Resolution