Large Data Volume Management

0/2 in this phase0/41 across the roadmap

📖 Concept

Enterprise Salesforce orgs often have millions of records. Understanding how to manage large data volumes (LDV) while maintaining performance is a senior/architect-level skill.

When is data "large"?

Standard objects: >1 million records
Custom objects: >500K records
Any object where queries start timing out or hitting governor limits

LDV optimization strategies:

Skinny Tables (request from Salesforce Support)
- A copy of a table with only the frequently queried fields
- Dramatically improves query performance on wide objects
- Maintained automatically by the platform
Custom Indexes (request from Salesforce Support)
- Standard indexes: Id, Name, OwnerId, RecordTypeId, CreatedDate, SystemModstamp
- Custom indexes: Any custom field can be indexed
- Two-column indexes for complex WHERE clauses
- External ID fields are automatically indexed
Query Optimization
- Always use selective filters (indexed fields in WHERE)
- Avoid leading wildcards: LIKE '%term' prevents index use
- Use LIMIT and pagination
- Filter early, process late
Data Skew
- Account Data Skew: One account with millions of child records
- Ownership Skew: One user owns millions of records
- Lookup Skew: Many records pointing to same parent
- Impact: Lock contention, sharing recalculation delays
Archival Strategies
- Move old data to Big Objects (for Salesforce storage)
- Archive to external systems (data warehouse)
- Use External Objects for on-demand access to archived data
- Implement soft deletes (IsArchived flag) for business logic

Storage limits:

Data storage: Based on edition + per-user allocation
File storage: Separate allocation for attachments/files
Big Objects: Separate, higher-capacity storage

💻 Code Example

codeTap to expand ⛶

1// Large Data Volume Patterns
2
3public class LargeDataVolumeService {
4    
5    // 1. Efficient pagination with query locator
6    public static List<Account> getAccountsPage(Integer pageSize, Integer offset) {
7        return [
8            SELECT Id, Name, Industry, CreatedDate
9            FROM Account
10            WHERE Industry != null  // Selective filter
11            ORDER BY CreatedDate DESC
12            LIMIT :pageSize
13            OFFSET :offset
14        ];
15    }
16    
17    // 2. Chunked processing for LDV
18    public static void processLargeDataSet() {
19        // Use Batch Apex for millions of records
20        Database.executeBatch(new LargeAccountBatch(), 200);
21    }
22    
23    // 3. Avoiding data skew in lookups
24    public static void distributeLoad(List<Case> cases) {
25        // BAD: All cases assigned to one queue
26        // GOOD: Distribute across multiple queues
27        List<Group> queues = [
28            SELECT Id FROM Group WHERE Type = 'Queue' AND Name LIKE 'Support_%'
29        ];
30        
31        if (queues.isEmpty()) return;
32        
33        Integer queueIndex = 0;
34        for (Case c : cases) {
35            c.OwnerId = queues[Math.mod(queueIndex, queues.size())].Id;
36            queueIndex++;
37        }
38    }
39    
40    // 4. Using Big Objects for archival
41    // Big Object definition (metadata)
42    // Customer_Interaction__b with fields:
43    //   Account_Id__c (Text, Index 1)
44    //   Interaction_Date__c (DateTime, Index 2)
45    //   Description__c (Text)
46    
47    public static void archiveTooBigObject(List<Customer_Interaction__c> records) {
48        List<Customer_Interaction__b> bigObjRecords = new List<Customer_Interaction__b>();
49        
50        for (Customer_Interaction__c rec : records) {
51            bigObjRecords.add(new Customer_Interaction__b(
52                Account_Id__c = rec.Account__c,
53                Interaction_Date__c = rec.Interaction_Date__c,
54                Description__c = rec.Description__c
55            ));
56        }
57        
58        // Async insert into Big Object
59        Database.insertImmediate(bigObjRecords);
60        
61        // Delete from standard object
62        delete records;
63    }
64    
65    // 5. Selective query example with custom index
66    public static List<Account> selectiveQuery(String industry, Date createdAfter) {
67        // This query is selective because:
68        // - Industry can be indexed (request custom index)
69        // - CreatedDate is auto-indexed
70        // Both conditions filter >90% of records
71        return [
72            SELECT Id, Name, Industry, AnnualRevenue
73            FROM Account
74            WHERE Industry = :industry           // Indexed field
75            AND CreatedDate > :createdAfter       // Auto-indexed
76            ORDER BY Name
77            LIMIT 2000
78        ];
79    }
80}
81
82// 6. Batch for LDV processing
83public class LargeAccountBatch implements Database.Batchable<SObject> {
84    
85    public Database.QueryLocator start(Database.BatchableContext bc) {
86        // QueryLocator can retrieve up to 50 MILLION records
87        return Database.getQueryLocator([
88            SELECT Id, Name, Industry, Last_Review_Date__c
89            FROM Account
90            WHERE Last_Review_Date__c < LAST_N_DAYS:365
91            OR Last_Review_Date__c = null
92        ]);
93    }
94    
95    public void execute(Database.BatchableContext bc, List<Account> scope) {
96        for (Account acc : scope) {
97            acc.Last_Review_Date__c = Date.today();
98            acc.Review_Status__c = 'Pending';
99        }
100        Database.update(scope, false); // Partial success
101    }
102    
103    public void finish(Database.BatchableContext bc) {
104        System.debug('Batch complete');
105    }
106}

🏋️ Practice Exercise

LDV Practice:

Create a custom object with 100,000+ records and test query performance
Use the Query Plan tool in Developer Console to analyze query selectivity
Implement pagination for a list view that handles 1M+ records
Design a data archival strategy using Big Objects for a 5+ year old data
Identify and fix data skew in your org (Account skew, Ownership skew)
Request custom indexes from Salesforce (or simulate the effect with External IDs)
Write a Batch Apex job that processes 500,000 records in 200-record chunks
Implement a search that works efficiently on objects with 1M+ records
Design a data management strategy for an org that adds 100K records per month
Build a dashboard that monitors record counts, storage usage, and query performance

⚠️ Common Mistakes

Using SOQL OFFSET for deep pagination — OFFSET has a 2,000 limit. For deep pagination, use WHERE Id > :lastId ORDER BY Id pattern
Not requesting custom indexes on frequently queried fields — without indexes, queries on 1M+ record objects will time out or fail
Creating lookup relationships to widely-shared records (data skew) — one Account with 10M Contacts causes lock contention on every Contact update
Not planning for data growth — an object that's fine at 100K records may fail at 1M. Design for 10x your current volume
Using SOQL for text search on LDV — SOQL LIKE queries don't use indexes with leading wildcards. Use SOSL (search index) instead

💼 Interview Questions

🎤 Mock Interview

Practice a live interview for Large Data Volume Management

Was this topic helpful?

← PreviousEnterprise Integration Patterns & Middleware Next →Shield Platform Encryption & Compliance