Object Storage & Blob Storage
📖 Concept
Not all data belongs in a database. Large binary data — images, videos, documents, backups, logs — should be stored in object storage, a specialized system designed for massive files and high throughput.
Object Storage vs File System vs Block Storage
| Feature | Object Storage | File System | Block Storage |
|---|---|---|---|
| Structure | Flat namespace with keys | Hierarchical (directories) | Raw disk blocks |
| Access pattern | HTTP API (PUT/GET/DELETE) | OS-level file operations | Low-level I/O |
| Metadata | Rich, custom metadata per object | Limited (permissions, timestamps) | None |
| Scale | Virtually unlimited | Limited by disk/NFS | Limited by disk |
| Durability | 99.999999999% (11 nines) | Depends on RAID | Depends on RAID |
| Cost | Very low (~$0.023/GB/month) | Medium | High |
| Examples | S3, GCS, Azure Blob | NFS, EFS | EBS, SAN |
How Object Storage Works
Objects are stored with three components:
- Key (path):
/users/123/profile/avatar.jpg - Data (the actual bytes): The image/video/document
- Metadata: Content-type, upload date, custom tags, ACL
Access is via simple HTTP:
PUT /bucket/key→ uploadGET /bucket/key→ downloadDELETE /bucket/key→ removeHEAD /bucket/key→ get metadata only
When to Use Object Storage
| Use Case | Why Object Storage? |
|---|---|
| User uploads (images, avatars) | Cheap, durable, serves via CDN |
| Video content | Handles massive files, integrates with transcoding |
| Data lake | Store raw data for analytics pipelines |
| Backups | Ultra-cheap archival ($0.004/GB/month for S3 Glacier) |
| Static website assets | Serve directly or through CDN |
| Log storage | Append-only, rarely read, huge volume |
Common Architecture Pattern
User uploads image → API Server → Store metadata in DB → Upload file to S3
User views image → CDN (cache) → S3 (origin) → Return image
The database stores metadata (filename, user_id, size, URL), and the actual file lives in object storage (S3, GCS).
Pre-signed URLs
Instead of routing every upload/download through your API server, use pre-signed URLs — temporary, authenticated URLs that allow direct upload/download to object storage.
This is critical for performance because:
- Large files don't flow through your API servers (saves bandwidth)
- Upload goes directly from client to S3 (lower latency)
- Download goes through CDN → S3 (optimal path)
Interview tip: When designing any system that handles media (images, videos, documents), always mention object storage + CDN. Don't say "store images in the database" — that's a major red flag.
💻 Code Example
1// ============================================2// Object Storage — Production Patterns3// ============================================45const AWS = require('aws-sdk');6const s3 = new AWS.S3();78// ---------- Pre-signed URL Upload Flow ----------910// ✅ GOOD: Client uploads directly to S3 via pre-signed URL11// Server never handles the large file!1213async function getUploadUrl(req, res) {14 const { fileName, fileType } = req.body;15 const key = `uploads/\${req.user.id}/\${Date.now()}-\${fileName}`;1617 // Generate a URL that allows direct upload to S318 const presignedUrl = await s3.getSignedUrlPromise('putObject', {19 Bucket: 'my-app-uploads',20 Key: key,21 ContentType: fileType,22 Expires: 300, // URL valid for 5 minutes23 Conditions: [24 ['content-length-range', 0, 10 * 1024 * 1024], // Max 10MB25 ],26 });2728 // Save metadata to database29 await db.insert('files', {30 userId: req.user.id,31 s3Key: key,32 fileName,33 fileType,34 status: 'pending_upload',35 });3637 res.json({38 uploadUrl: presignedUrl,39 key,40 expiresIn: 300,41 });42}4344// Client-side upload (browser):45// const response = await fetch(uploadUrl, {46// method: 'PUT',47// body: file,48// headers: { 'Content-Type': file.type }49// });5051// ---------- Pre-signed URL Download Flow ----------5253async function getDownloadUrl(req, res) {54 const { fileId } = req.params;55 const file = await db.findFile(fileId);5657 if (!file) return res.status(404).json({ error: 'File not found' });5859 // Check access permissions60 if (file.userId !== req.user.id) {61 return res.status(403).json({ error: 'Not authorized' });62 }6364 const downloadUrl = await s3.getSignedUrlPromise('getObject', {65 Bucket: 'my-app-uploads',66 Key: file.s3Key,67 Expires: 3600, // URL valid for 1 hour68 ResponseContentDisposition:69 `attachment; filename="\${file.fileName}"`,70 });7172 res.json({ downloadUrl, expiresIn: 3600 });73}7475// ---------- Image Upload with Processing Pipeline ----------7677// ❌ BAD: Process images synchronously in the API handler78async function badImageUpload(req, res) {79 const image = req.file;80 const thumbnail = await sharp(image.buffer).resize(150, 150).toBuffer();81 const medium = await sharp(image.buffer).resize(600, 600).toBuffer();82 const large = await sharp(image.buffer).resize(1200, 1200).toBuffer();8384 await Promise.all([85 s3Upload('thumbnails/', thumbnail),86 s3Upload('medium/', medium),87 s3Upload('large/', large),88 s3Upload('original/', image.buffer),89 ]);90 // User waits for ALL processing + 4 uploads = 5-15 seconds!91 res.json({ success: true });92}9394// ✅ GOOD: Upload original, process asynchronously95async function goodImageUpload(req, res) {96 const key = `originals/\${req.user.id}/\${Date.now()}.jpg`;9798 // Upload original only99 await s3.putObject({100 Bucket: 'my-app-images',101 Key: key,102 Body: req.file.buffer,103 ContentType: 'image/jpeg',104 }).promise();105106 // Trigger async processing (S3 event → Lambda/SQS)107 // Lambda will generate thumbnails and multiple sizes108 await db.insert('images', {109 userId: req.user.id,110 originalKey: key,111 status: 'processing', // Will become 'ready' after Lambda runs112 });113114 // User gets immediate response115 res.status(202).json({116 message: 'Image uploaded, processing in background',117 imageId: 'img_123',118 });119}120121// Lambda function triggered by S3 upload event:122async function processImageLambda(event) {123 const bucket = event.Records[0].s3.bucket.name;124 const key = event.Records[0].s3.object.key;125126 const original = await s3.getObject({ Bucket: bucket, Key: key }).promise();127 const sharp = require('sharp');128129 const sizes = [130 { name: 'thumbnail', width: 150, height: 150 },131 { name: 'medium', width: 600, height: 600 },132 { name: 'large', width: 1200, height: 1200 },133 ];134135 for (const size of sizes) {136 const resized = await sharp(original.Body)137 .resize(size.width, size.height, { fit: 'cover' })138 .jpeg({ quality: 85 })139 .toBuffer();140141 await s3.putObject({142 Bucket: bucket,143 Key: key.replace('originals/', `\${size.name}/`),144 Body: resized,145 ContentType: 'image/jpeg',146 CacheControl: 'public, max-age=31536000', // Cache for 1 year147 }).promise();148 }149150 // Update database status151 await db.update('images', { originalKey: key }, { status: 'ready' });152}153154// ---------- Storage Class Selection ----------155156const storageClasses = {157 'S3 Standard': {158 cost: '$0.023/GB/month',159 useCase: 'Frequently accessed data (user profiles, active content)',160 retrieval: 'Instant',161 },162 'S3 Intelligent-Tiering': {163 cost: '$0.023/GB/month (auto-optimized)',164 useCase: 'Unknown access patterns',165 retrieval: 'Instant',166 },167 'S3 Infrequent Access': {168 cost: '$0.0125/GB/month',169 useCase: 'Accessed less than once a month (old data)',170 retrieval: 'Instant (per-request fee)',171 },172 'S3 Glacier': {173 cost: '$0.004/GB/month',174 useCase: 'Archival (compliance, backups)',175 retrieval: 'Minutes to hours',176 },177 'S3 Glacier Deep Archive': {178 cost: '$0.00099/GB/month',179 useCase: 'Long-term archival (7+ years retention)',180 retrieval: '12-48 hours',181 },182};183184async function s3Upload(prefix, buffer) {185 // helper stub186}187188console.log('Storage classes:', storageClasses);
🏋️ Practice Exercise
Upload Architecture: Design a complete image upload system for a social media app. Include: client upload flow (pre-signed URLs), image processing pipeline (thumbnails, watermarks), CDN delivery, and cost estimation for 1M images/day.
Video Streaming: Design the storage architecture for a video platform like YouTube. How do you handle: multiple resolutions, chunked uploads (resume on failure), transcoding pipeline, and storage costs for 500 hours of video uploaded per minute?
Data Lake Design: Design a data lake architecture on S3 for an e-commerce company. Include: raw event ingestion, data transformation pipeline (ETL), partitioning strategy, and query layer (Athena/Spark).
Cost Optimization: You're storing 500TB of user data on S3 Standard ($11,500/month). Analytics show 70% hasn't been accessed in 90 days. Design a lifecycle policy to reduce costs using storage tiering.
Pre-signed URL Security: Identify and fix the security issues in this pre-signed URL generation: (a) URLs that never expire, (b) no file size limits, (c) no file type validation, (d) no user authentication check.
⚠️ Common Mistakes
Storing large files (images, videos) in the database — databases are optimized for structured queries, not blob storage. Large files in a database slow down backups, replication, and queries. Use object storage (S3).
Not using pre-signed URLs — routing uploads and downloads through your API server wastes server bandwidth and CPU. Let clients talk directly to object storage using pre-signed URLs.
Forgetting about storage costs at scale — storing 1 million images at 2MB each = 2TB = ~$46/month on S3 Standard. But with 4 sizes per image + CDN egress, costs multiply quickly. Plan storage tiers and lifecycle policies.
Processing uploads synchronously — generating thumbnails, transcoding video, or scanning for viruses should happen asynchronously. Use event-driven processing (S3 events → Lambda/SQS) to keep upload latency low.
💼 Interview Questions
🎤 Mock Interview
Practice a live interview for Object Storage & Blob Storage