Object Storage & Blob Storage

0/6 in this phase0/45 across the roadmap

📖 Concept

Not all data belongs in a database. Large binary data — images, videos, documents, backups, logs — should be stored in object storage, a specialized system designed for massive files and high throughput.

Object Storage vs File System vs Block Storage

Feature Object Storage File System Block Storage
Structure Flat namespace with keys Hierarchical (directories) Raw disk blocks
Access pattern HTTP API (PUT/GET/DELETE) OS-level file operations Low-level I/O
Metadata Rich, custom metadata per object Limited (permissions, timestamps) None
Scale Virtually unlimited Limited by disk/NFS Limited by disk
Durability 99.999999999% (11 nines) Depends on RAID Depends on RAID
Cost Very low (~$0.023/GB/month) Medium High
Examples S3, GCS, Azure Blob NFS, EFS EBS, SAN

How Object Storage Works

Objects are stored with three components:

  1. Key (path): /users/123/profile/avatar.jpg
  2. Data (the actual bytes): The image/video/document
  3. Metadata: Content-type, upload date, custom tags, ACL

Access is via simple HTTP:

  • PUT /bucket/key → upload
  • GET /bucket/key → download
  • DELETE /bucket/key → remove
  • HEAD /bucket/key → get metadata only

When to Use Object Storage

Use Case Why Object Storage?
User uploads (images, avatars) Cheap, durable, serves via CDN
Video content Handles massive files, integrates with transcoding
Data lake Store raw data for analytics pipelines
Backups Ultra-cheap archival ($0.004/GB/month for S3 Glacier)
Static website assets Serve directly or through CDN
Log storage Append-only, rarely read, huge volume

Common Architecture Pattern

User uploads image → API Server → Store metadata in DB → Upload file to S3
User views image  → CDN (cache) → S3 (origin) → Return image

The database stores metadata (filename, user_id, size, URL), and the actual file lives in object storage (S3, GCS).

Pre-signed URLs

Instead of routing every upload/download through your API server, use pre-signed URLs — temporary, authenticated URLs that allow direct upload/download to object storage.

This is critical for performance because:

  • Large files don't flow through your API servers (saves bandwidth)
  • Upload goes directly from client to S3 (lower latency)
  • Download goes through CDN → S3 (optimal path)

Interview tip: When designing any system that handles media (images, videos, documents), always mention object storage + CDN. Don't say "store images in the database" — that's a major red flag.

💻 Code Example

codeTap to expand ⛶
1// ============================================
2// Object Storage — Production Patterns
3// ============================================
4
5const AWS = require('aws-sdk');
6const s3 = new AWS.S3();
7
8// ---------- Pre-signed URL Upload Flow ----------
9
10// ✅ GOOD: Client uploads directly to S3 via pre-signed URL
11// Server never handles the large file!
12
13async function getUploadUrl(req, res) {
14 const { fileName, fileType } = req.body;
15 const key = `uploads/\${req.user.id}/\${Date.now()}-\${fileName}`;
16
17 // Generate a URL that allows direct upload to S3
18 const presignedUrl = await s3.getSignedUrlPromise('putObject', {
19 Bucket: 'my-app-uploads',
20 Key: key,
21 ContentType: fileType,
22 Expires: 300, // URL valid for 5 minutes
23 Conditions: [
24 ['content-length-range', 0, 10 * 1024 * 1024], // Max 10MB
25 ],
26 });
27
28 // Save metadata to database
29 await db.insert('files', {
30 userId: req.user.id,
31 s3Key: key,
32 fileName,
33 fileType,
34 status: 'pending_upload',
35 });
36
37 res.json({
38 uploadUrl: presignedUrl,
39 key,
40 expiresIn: 300,
41 });
42}
43
44// Client-side upload (browser):
45// const response = await fetch(uploadUrl, {
46// method: 'PUT',
47// body: file,
48// headers: { 'Content-Type': file.type }
49// });
50
51// ---------- Pre-signed URL Download Flow ----------
52
53async function getDownloadUrl(req, res) {
54 const { fileId } = req.params;
55 const file = await db.findFile(fileId);
56
57 if (!file) return res.status(404).json({ error: 'File not found' });
58
59 // Check access permissions
60 if (file.userId !== req.user.id) {
61 return res.status(403).json({ error: 'Not authorized' });
62 }
63
64 const downloadUrl = await s3.getSignedUrlPromise('getObject', {
65 Bucket: 'my-app-uploads',
66 Key: file.s3Key,
67 Expires: 3600, // URL valid for 1 hour
68 ResponseContentDisposition:
69 `attachment; filename="\${file.fileName}"`,
70 });
71
72 res.json({ downloadUrl, expiresIn: 3600 });
73}
74
75// ---------- Image Upload with Processing Pipeline ----------
76
77// ❌ BAD: Process images synchronously in the API handler
78async function badImageUpload(req, res) {
79 const image = req.file;
80 const thumbnail = await sharp(image.buffer).resize(150, 150).toBuffer();
81 const medium = await sharp(image.buffer).resize(600, 600).toBuffer();
82 const large = await sharp(image.buffer).resize(1200, 1200).toBuffer();
83
84 await Promise.all([
85 s3Upload('thumbnails/', thumbnail),
86 s3Upload('medium/', medium),
87 s3Upload('large/', large),
88 s3Upload('original/', image.buffer),
89 ]);
90 // User waits for ALL processing + 4 uploads = 5-15 seconds!
91 res.json({ success: true });
92}
93
94// ✅ GOOD: Upload original, process asynchronously
95async function goodImageUpload(req, res) {
96 const key = `originals/\${req.user.id}/\${Date.now()}.jpg`;
97
98 // Upload original only
99 await s3.putObject({
100 Bucket: 'my-app-images',
101 Key: key,
102 Body: req.file.buffer,
103 ContentType: 'image/jpeg',
104 }).promise();
105
106 // Trigger async processing (S3 event → Lambda/SQS)
107 // Lambda will generate thumbnails and multiple sizes
108 await db.insert('images', {
109 userId: req.user.id,
110 originalKey: key,
111 status: 'processing', // Will become 'ready' after Lambda runs
112 });
113
114 // User gets immediate response
115 res.status(202).json({
116 message: 'Image uploaded, processing in background',
117 imageId: 'img_123',
118 });
119}
120
121// Lambda function triggered by S3 upload event:
122async function processImageLambda(event) {
123 const bucket = event.Records[0].s3.bucket.name;
124 const key = event.Records[0].s3.object.key;
125
126 const original = await s3.getObject({ Bucket: bucket, Key: key }).promise();
127 const sharp = require('sharp');
128
129 const sizes = [
130 { name: 'thumbnail', width: 150, height: 150 },
131 { name: 'medium', width: 600, height: 600 },
132 { name: 'large', width: 1200, height: 1200 },
133 ];
134
135 for (const size of sizes) {
136 const resized = await sharp(original.Body)
137 .resize(size.width, size.height, { fit: 'cover' })
138 .jpeg({ quality: 85 })
139 .toBuffer();
140
141 await s3.putObject({
142 Bucket: bucket,
143 Key: key.replace('originals/', `\${size.name}/`),
144 Body: resized,
145 ContentType: 'image/jpeg',
146 CacheControl: 'public, max-age=31536000', // Cache for 1 year
147 }).promise();
148 }
149
150 // Update database status
151 await db.update('images', { originalKey: key }, { status: 'ready' });
152}
153
154// ---------- Storage Class Selection ----------
155
156const storageClasses = {
157 'S3 Standard': {
158 cost: '$0.023/GB/month',
159 useCase: 'Frequently accessed data (user profiles, active content)',
160 retrieval: 'Instant',
161 },
162 'S3 Intelligent-Tiering': {
163 cost: '$0.023/GB/month (auto-optimized)',
164 useCase: 'Unknown access patterns',
165 retrieval: 'Instant',
166 },
167 'S3 Infrequent Access': {
168 cost: '$0.0125/GB/month',
169 useCase: 'Accessed less than once a month (old data)',
170 retrieval: 'Instant (per-request fee)',
171 },
172 'S3 Glacier': {
173 cost: '$0.004/GB/month',
174 useCase: 'Archival (compliance, backups)',
175 retrieval: 'Minutes to hours',
176 },
177 'S3 Glacier Deep Archive': {
178 cost: '$0.00099/GB/month',
179 useCase: 'Long-term archival (7+ years retention)',
180 retrieval: '12-48 hours',
181 },
182};
183
184async function s3Upload(prefix, buffer) {
185 // helper stub
186}
187
188console.log('Storage classes:', storageClasses);

🏋️ Practice Exercise

  1. Upload Architecture: Design a complete image upload system for a social media app. Include: client upload flow (pre-signed URLs), image processing pipeline (thumbnails, watermarks), CDN delivery, and cost estimation for 1M images/day.

  2. Video Streaming: Design the storage architecture for a video platform like YouTube. How do you handle: multiple resolutions, chunked uploads (resume on failure), transcoding pipeline, and storage costs for 500 hours of video uploaded per minute?

  3. Data Lake Design: Design a data lake architecture on S3 for an e-commerce company. Include: raw event ingestion, data transformation pipeline (ETL), partitioning strategy, and query layer (Athena/Spark).

  4. Cost Optimization: You're storing 500TB of user data on S3 Standard ($11,500/month). Analytics show 70% hasn't been accessed in 90 days. Design a lifecycle policy to reduce costs using storage tiering.

  5. Pre-signed URL Security: Identify and fix the security issues in this pre-signed URL generation: (a) URLs that never expire, (b) no file size limits, (c) no file type validation, (d) no user authentication check.

⚠️ Common Mistakes

  • Storing large files (images, videos) in the database — databases are optimized for structured queries, not blob storage. Large files in a database slow down backups, replication, and queries. Use object storage (S3).

  • Not using pre-signed URLs — routing uploads and downloads through your API server wastes server bandwidth and CPU. Let clients talk directly to object storage using pre-signed URLs.

  • Forgetting about storage costs at scale — storing 1 million images at 2MB each = 2TB = ~$46/month on S3 Standard. But with 4 sizes per image + CDN egress, costs multiply quickly. Plan storage tiers and lifecycle policies.

  • Processing uploads synchronously — generating thumbnails, transcoding video, or scanning for viruses should happen asynchronously. Use event-driven processing (S3 events → Lambda/SQS) to keep upload latency low.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Object Storage & Blob Storage