Object Storage & Blob Storage

0/6 in this phase0/45 across the roadmap

📖 Concept

Not all data belongs in a database. Large binary data — images, videos, documents, backups, logs — should be stored in object storage, a specialized system designed for massive files and high throughput.

Object Storage vs File System vs Block Storage

Feature	Object Storage	File System	Block Storage
Structure	Flat namespace with keys	Hierarchical (directories)	Raw disk blocks
Access pattern	HTTP API (PUT/GET/DELETE)	OS-level file operations	Low-level I/O
Metadata	Rich, custom metadata per object	Limited (permissions, timestamps)	None
Scale	Virtually unlimited	Limited by disk/NFS	Limited by disk
Durability	99.999999999% (11 nines)	Depends on RAID	Depends on RAID
Cost	Very low (~$0.023/GB/month)	Medium	High
Examples	S3, GCS, Azure Blob	NFS, EFS	EBS, SAN

How Object Storage Works

Objects are stored with three components:

Key (path): /users/123/profile/avatar.jpg
Data (the actual bytes): The image/video/document
Metadata: Content-type, upload date, custom tags, ACL

Access is via simple HTTP:

PUT /bucket/key → upload
GET /bucket/key → download
DELETE /bucket/key → remove
HEAD /bucket/key → get metadata only

When to Use Object Storage

Use Case	Why Object Storage?
User uploads (images, avatars)	Cheap, durable, serves via CDN
Video content	Handles massive files, integrates with transcoding
Data lake	Store raw data for analytics pipelines
Backups	Ultra-cheap archival ($0.004/GB/month for S3 Glacier)
Static website assets	Serve directly or through CDN
Log storage	Append-only, rarely read, huge volume

Common Architecture Pattern

User uploads image → API Server → Store metadata in DB → Upload file to S3
User views image  → CDN (cache) → S3 (origin) → Return image

The database stores metadata (filename, user_id, size, URL), and the actual file lives in object storage (S3, GCS).

Pre-signed URLs

Instead of routing every upload/download through your API server, use pre-signed URLs — temporary, authenticated URLs that allow direct upload/download to object storage.

This is critical for performance because:

Large files don't flow through your API servers (saves bandwidth)
Upload goes directly from client to S3 (lower latency)
Download goes through CDN → S3 (optimal path)

Interview tip: When designing any system that handles media (images, videos, documents), always mention object storage + CDN. Don't say "store images in the database" — that's a major red flag.

💻 Code Example

codeTap to expand ⛶

1// ============================================
2// Object Storage — Production Patterns
3// ============================================
4
5const AWS = require('aws-sdk');
6const s3 = new AWS.S3();
7
8// ---------- Pre-signed URL Upload Flow ----------
9
10// ✅ GOOD: Client uploads directly to S3 via pre-signed URL
11// Server never handles the large file!
12
13async function getUploadUrl(req, res) {
14  const { fileName, fileType } = req.body;
15  const key = `uploads/\${req.user.id}/\${Date.now()}-\${fileName}`;
16
17  // Generate a URL that allows direct upload to S3
18  const presignedUrl = await s3.getSignedUrlPromise('putObject', {
19    Bucket: 'my-app-uploads',
20    Key: key,
21    ContentType: fileType,
22    Expires: 300, // URL valid for 5 minutes
23    Conditions: [
24      ['content-length-range', 0, 10 * 1024 * 1024], // Max 10MB
25    ],
26  });
27
28  // Save metadata to database
29  await db.insert('files', {
30    userId: req.user.id,
31    s3Key: key,
32    fileName,
33    fileType,
34    status: 'pending_upload',
35  });
36
37  res.json({
38    uploadUrl: presignedUrl,
39    key,
40    expiresIn: 300,
41  });
42}
43
44// Client-side upload (browser):
45// const response = await fetch(uploadUrl, {
46//   method: 'PUT',
47//   body: file,
48//   headers: { 'Content-Type': file.type }
49// });
50
51// ---------- Pre-signed URL Download Flow ----------
52
53async function getDownloadUrl(req, res) {
54  const { fileId } = req.params;
55  const file = await db.findFile(fileId);
56
57  if (!file) return res.status(404).json({ error: 'File not found' });
58
59  // Check access permissions
60  if (file.userId !== req.user.id) {
61    return res.status(403).json({ error: 'Not authorized' });
62  }
63
64  const downloadUrl = await s3.getSignedUrlPromise('getObject', {
65    Bucket: 'my-app-uploads',
66    Key: file.s3Key,
67    Expires: 3600, // URL valid for 1 hour
68    ResponseContentDisposition:
69      `attachment; filename="\${file.fileName}"`,
70  });
71
72  res.json({ downloadUrl, expiresIn: 3600 });
73}
74
75// ---------- Image Upload with Processing Pipeline ----------
76
77// ❌ BAD: Process images synchronously in the API handler
78async function badImageUpload(req, res) {
79  const image = req.file;
80  const thumbnail = await sharp(image.buffer).resize(150, 150).toBuffer();
81  const medium = await sharp(image.buffer).resize(600, 600).toBuffer();
82  const large = await sharp(image.buffer).resize(1200, 1200).toBuffer();
83
84  await Promise.all([
85    s3Upload('thumbnails/', thumbnail),
86    s3Upload('medium/', medium),
87    s3Upload('large/', large),
88    s3Upload('original/', image.buffer),
89  ]);
90  // User waits for ALL processing + 4 uploads = 5-15 seconds!
91  res.json({ success: true });
92}
93
94// ✅ GOOD: Upload original, process asynchronously
95async function goodImageUpload(req, res) {
96  const key = `originals/\${req.user.id}/\${Date.now()}.jpg`;
97
98  // Upload original only
99  await s3.putObject({
100    Bucket: 'my-app-images',
101    Key: key,
102    Body: req.file.buffer,
103    ContentType: 'image/jpeg',
104  }).promise();
105
106  // Trigger async processing (S3 event → Lambda/SQS)
107  // Lambda will generate thumbnails and multiple sizes
108  await db.insert('images', {
109    userId: req.user.id,
110    originalKey: key,
111    status: 'processing', // Will become 'ready' after Lambda runs
112  });
113
114  // User gets immediate response
115  res.status(202).json({
116    message: 'Image uploaded, processing in background',
117    imageId: 'img_123',
118  });
119}
120
121// Lambda function triggered by S3 upload event:
122async function processImageLambda(event) {
123  const bucket = event.Records[0].s3.bucket.name;
124  const key = event.Records[0].s3.object.key;
125
126  const original = await s3.getObject({ Bucket: bucket, Key: key }).promise();
127  const sharp = require('sharp');
128
129  const sizes = [
130    { name: 'thumbnail', width: 150, height: 150 },
131    { name: 'medium', width: 600, height: 600 },
132    { name: 'large', width: 1200, height: 1200 },
133  ];
134
135  for (const size of sizes) {
136    const resized = await sharp(original.Body)
137      .resize(size.width, size.height, { fit: 'cover' })
138      .jpeg({ quality: 85 })
139      .toBuffer();
140
141    await s3.putObject({
142      Bucket: bucket,
143      Key: key.replace('originals/', `\${size.name}/`),
144      Body: resized,
145      ContentType: 'image/jpeg',
146      CacheControl: 'public, max-age=31536000', // Cache for 1 year
147    }).promise();
148  }
149
150  // Update database status
151  await db.update('images', { originalKey: key }, { status: 'ready' });
152}
153
154// ---------- Storage Class Selection ----------
155
156const storageClasses = {
157  'S3 Standard': {
158    cost: '$0.023/GB/month',
159    useCase: 'Frequently accessed data (user profiles, active content)',
160    retrieval: 'Instant',
161  },
162  'S3 Intelligent-Tiering': {
163    cost: '$0.023/GB/month (auto-optimized)',
164    useCase: 'Unknown access patterns',
165    retrieval: 'Instant',
166  },
167  'S3 Infrequent Access': {
168    cost: '$0.0125/GB/month',
169    useCase: 'Accessed less than once a month (old data)',
170    retrieval: 'Instant (per-request fee)',
171  },
172  'S3 Glacier': {
173    cost: '$0.004/GB/month',
174    useCase: 'Archival (compliance, backups)',
175    retrieval: 'Minutes to hours',
176  },
177  'S3 Glacier Deep Archive': {
178    cost: '$0.00099/GB/month',
179    useCase: 'Long-term archival (7+ years retention)',
180    retrieval: '12-48 hours',
181  },
182};
183
184async function s3Upload(prefix, buffer) {
185  // helper stub
186}
187
188console.log('Storage classes:', storageClasses);

🏋️ Practice Exercise

Upload Architecture: Design a complete image upload system for a social media app. Include: client upload flow (pre-signed URLs), image processing pipeline (thumbnails, watermarks), CDN delivery, and cost estimation for 1M images/day.
Video Streaming: Design the storage architecture for a video platform like YouTube. How do you handle: multiple resolutions, chunked uploads (resume on failure), transcoding pipeline, and storage costs for 500 hours of video uploaded per minute?
Data Lake Design: Design a data lake architecture on S3 for an e-commerce company. Include: raw event ingestion, data transformation pipeline (ETL), partitioning strategy, and query layer (Athena/Spark).
Cost Optimization: You're storing 500TB of user data on S3 Standard ($11,500/month). Analytics show 70% hasn't been accessed in 90 days. Design a lifecycle policy to reduce costs using storage tiering.
Pre-signed URL Security: Identify and fix the security issues in this pre-signed URL generation: (a) URLs that never expire, (b) no file size limits, (c) no file type validation, (d) no user authentication check.

⚠️ Common Mistakes

Storing large files (images, videos) in the database — databases are optimized for structured queries, not blob storage. Large files in a database slow down backups, replication, and queries. Use object storage (S3).
Not using pre-signed URLs — routing uploads and downloads through your API server wastes server bandwidth and CPU. Let clients talk directly to object storage using pre-signed URLs.
Forgetting about storage costs at scale — storing 1 million images at 2MB each = 2TB = ~$46/month on S3 Standard. But with 4 sizes per image + CDN egress, costs multiply quickly. Plan storage tiers and lifecycle policies.
Processing uploads synchronously — generating thumbnails, transcoding video, or scanning for viruses should happen asynchronously. Use event-driven processing (S3 events → Lambda/SQS) to keep upload latency low.

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Object Storage & Blob Storage

Was this topic helpful?

← PreviousData Modeling Patterns Next →Caching Fundamentals & Patterns