Large Data Volume Management
📖 Concept
Enterprise Salesforce orgs often have millions of records. Understanding how to manage large data volumes (LDV) while maintaining performance is a senior/architect-level skill.
When is data "large"?
- Standard objects: >1 million records
- Custom objects: >500K records
- Any object where queries start timing out or hitting governor limits
LDV optimization strategies:
Skinny Tables (request from Salesforce Support)
- A copy of a table with only the frequently queried fields
- Dramatically improves query performance on wide objects
- Maintained automatically by the platform
Custom Indexes (request from Salesforce Support)
- Standard indexes: Id, Name, OwnerId, RecordTypeId, CreatedDate, SystemModstamp
- Custom indexes: Any custom field can be indexed
- Two-column indexes for complex WHERE clauses
- External ID fields are automatically indexed
Query Optimization
- Always use selective filters (indexed fields in WHERE)
- Avoid leading wildcards:
LIKE '%term'prevents index use - Use
LIMITand pagination - Filter early, process late
Data Skew
- Account Data Skew: One account with millions of child records
- Ownership Skew: One user owns millions of records
- Lookup Skew: Many records pointing to same parent
- Impact: Lock contention, sharing recalculation delays
Archival Strategies
- Move old data to
Big Objects(for Salesforce storage) - Archive to external systems (data warehouse)
- Use
External Objectsfor on-demand access to archived data - Implement soft deletes (IsArchived flag) for business logic
- Move old data to
Storage limits:
- Data storage: Based on edition + per-user allocation
- File storage: Separate allocation for attachments/files
- Big Objects: Separate, higher-capacity storage
💻 Code Example
1// Large Data Volume Patterns23public class LargeDataVolumeService {45 // 1. Efficient pagination with query locator6 public static List<Account> getAccountsPage(Integer pageSize, Integer offset) {7 return [8 SELECT Id, Name, Industry, CreatedDate9 FROM Account10 WHERE Industry != null // Selective filter11 ORDER BY CreatedDate DESC12 LIMIT :pageSize13 OFFSET :offset14 ];15 }1617 // 2. Chunked processing for LDV18 public static void processLargeDataSet() {19 // Use Batch Apex for millions of records20 Database.executeBatch(new LargeAccountBatch(), 200);21 }2223 // 3. Avoiding data skew in lookups24 public static void distributeLoad(List<Case> cases) {25 // BAD: All cases assigned to one queue26 // GOOD: Distribute across multiple queues27 List<Group> queues = [28 SELECT Id FROM Group WHERE Type = 'Queue' AND Name LIKE 'Support_%'29 ];3031 if (queues.isEmpty()) return;3233 Integer queueIndex = 0;34 for (Case c : cases) {35 c.OwnerId = queues[Math.mod(queueIndex, queues.size())].Id;36 queueIndex++;37 }38 }3940 // 4. Using Big Objects for archival41 // Big Object definition (metadata)42 // Customer_Interaction__b with fields:43 // Account_Id__c (Text, Index 1)44 // Interaction_Date__c (DateTime, Index 2)45 // Description__c (Text)4647 public static void archiveTooBigObject(List<Customer_Interaction__c> records) {48 List<Customer_Interaction__b> bigObjRecords = new List<Customer_Interaction__b>();4950 for (Customer_Interaction__c rec : records) {51 bigObjRecords.add(new Customer_Interaction__b(52 Account_Id__c = rec.Account__c,53 Interaction_Date__c = rec.Interaction_Date__c,54 Description__c = rec.Description__c55 ));56 }5758 // Async insert into Big Object59 Database.insertImmediate(bigObjRecords);6061 // Delete from standard object62 delete records;63 }6465 // 5. Selective query example with custom index66 public static List<Account> selectiveQuery(String industry, Date createdAfter) {67 // This query is selective because:68 // - Industry can be indexed (request custom index)69 // - CreatedDate is auto-indexed70 // Both conditions filter >90% of records71 return [72 SELECT Id, Name, Industry, AnnualRevenue73 FROM Account74 WHERE Industry = :industry // Indexed field75 AND CreatedDate > :createdAfter // Auto-indexed76 ORDER BY Name77 LIMIT 200078 ];79 }80}8182// 6. Batch for LDV processing83public class LargeAccountBatch implements Database.Batchable<SObject> {8485 public Database.QueryLocator start(Database.BatchableContext bc) {86 // QueryLocator can retrieve up to 50 MILLION records87 return Database.getQueryLocator([88 SELECT Id, Name, Industry, Last_Review_Date__c89 FROM Account90 WHERE Last_Review_Date__c < LAST_N_DAYS:36591 OR Last_Review_Date__c = null92 ]);93 }9495 public void execute(Database.BatchableContext bc, List<Account> scope) {96 for (Account acc : scope) {97 acc.Last_Review_Date__c = Date.today();98 acc.Review_Status__c = 'Pending';99 }100 Database.update(scope, false); // Partial success101 }102103 public void finish(Database.BatchableContext bc) {104 System.debug('Batch complete');105 }106}
🏋️ Practice Exercise
LDV Practice:
- Create a custom object with 100,000+ records and test query performance
- Use the Query Plan tool in Developer Console to analyze query selectivity
- Implement pagination for a list view that handles 1M+ records
- Design a data archival strategy using Big Objects for a 5+ year old data
- Identify and fix data skew in your org (Account skew, Ownership skew)
- Request custom indexes from Salesforce (or simulate the effect with External IDs)
- Write a Batch Apex job that processes 500,000 records in 200-record chunks
- Implement a search that works efficiently on objects with 1M+ records
- Design a data management strategy for an org that adds 100K records per month
- Build a dashboard that monitors record counts, storage usage, and query performance
⚠️ Common Mistakes
Using SOQL OFFSET for deep pagination — OFFSET has a 2,000 limit. For deep pagination, use WHERE Id > :lastId ORDER BY Id pattern
Not requesting custom indexes on frequently queried fields — without indexes, queries on 1M+ record objects will time out or fail
Creating lookup relationships to widely-shared records (data skew) — one Account with 10M Contacts causes lock contention on every Contact update
Not planning for data growth — an object that's fine at 100K records may fail at 1M. Design for 10x your current volume
Using SOQL for text search on LDV — SOQL LIKE queries don't use indexes with leading wildcards. Use SOSL (search index) instead
💼 Interview Questions
🎤 Mock Interview
Mock interview is powered by AI for Large Data Volume Management. Login to unlock this feature.