Large Data Volume Management

0/2 in this phase0/41 across the roadmap

📖 Concept

Enterprise Salesforce orgs often have millions of records. Understanding how to manage large data volumes (LDV) while maintaining performance is a senior/architect-level skill.

When is data "large"?

  • Standard objects: >1 million records
  • Custom objects: >500K records
  • Any object where queries start timing out or hitting governor limits

LDV optimization strategies:

  1. Skinny Tables (request from Salesforce Support)

    • A copy of a table with only the frequently queried fields
    • Dramatically improves query performance on wide objects
    • Maintained automatically by the platform
  2. Custom Indexes (request from Salesforce Support)

    • Standard indexes: Id, Name, OwnerId, RecordTypeId, CreatedDate, SystemModstamp
    • Custom indexes: Any custom field can be indexed
    • Two-column indexes for complex WHERE clauses
    • External ID fields are automatically indexed
  3. Query Optimization

    • Always use selective filters (indexed fields in WHERE)
    • Avoid leading wildcards: LIKE '%term' prevents index use
    • Use LIMIT and pagination
    • Filter early, process late
  4. Data Skew

    • Account Data Skew: One account with millions of child records
    • Ownership Skew: One user owns millions of records
    • Lookup Skew: Many records pointing to same parent
    • Impact: Lock contention, sharing recalculation delays
  5. Archival Strategies

    • Move old data to Big Objects (for Salesforce storage)
    • Archive to external systems (data warehouse)
    • Use External Objects for on-demand access to archived data
    • Implement soft deletes (IsArchived flag) for business logic

Storage limits:

  • Data storage: Based on edition + per-user allocation
  • File storage: Separate allocation for attachments/files
  • Big Objects: Separate, higher-capacity storage

💻 Code Example

codeTap to expand ⛶
1// Large Data Volume Patterns
2
3public class LargeDataVolumeService {
4
5 // 1. Efficient pagination with query locator
6 public static List<Account> getAccountsPage(Integer pageSize, Integer offset) {
7 return [
8 SELECT Id, Name, Industry, CreatedDate
9 FROM Account
10 WHERE Industry != null // Selective filter
11 ORDER BY CreatedDate DESC
12 LIMIT :pageSize
13 OFFSET :offset
14 ];
15 }
16
17 // 2. Chunked processing for LDV
18 public static void processLargeDataSet() {
19 // Use Batch Apex for millions of records
20 Database.executeBatch(new LargeAccountBatch(), 200);
21 }
22
23 // 3. Avoiding data skew in lookups
24 public static void distributeLoad(List<Case> cases) {
25 // BAD: All cases assigned to one queue
26 // GOOD: Distribute across multiple queues
27 List<Group> queues = [
28 SELECT Id FROM Group WHERE Type = 'Queue' AND Name LIKE 'Support_%'
29 ];
30
31 if (queues.isEmpty()) return;
32
33 Integer queueIndex = 0;
34 for (Case c : cases) {
35 c.OwnerId = queues[Math.mod(queueIndex, queues.size())].Id;
36 queueIndex++;
37 }
38 }
39
40 // 4. Using Big Objects for archival
41 // Big Object definition (metadata)
42 // Customer_Interaction__b with fields:
43 // Account_Id__c (Text, Index 1)
44 // Interaction_Date__c (DateTime, Index 2)
45 // Description__c (Text)
46
47 public static void archiveTooBigObject(List<Customer_Interaction__c> records) {
48 List<Customer_Interaction__b> bigObjRecords = new List<Customer_Interaction__b>();
49
50 for (Customer_Interaction__c rec : records) {
51 bigObjRecords.add(new Customer_Interaction__b(
52 Account_Id__c = rec.Account__c,
53 Interaction_Date__c = rec.Interaction_Date__c,
54 Description__c = rec.Description__c
55 ));
56 }
57
58 // Async insert into Big Object
59 Database.insertImmediate(bigObjRecords);
60
61 // Delete from standard object
62 delete records;
63 }
64
65 // 5. Selective query example with custom index
66 public static List<Account> selectiveQuery(String industry, Date createdAfter) {
67 // This query is selective because:
68 // - Industry can be indexed (request custom index)
69 // - CreatedDate is auto-indexed
70 // Both conditions filter >90% of records
71 return [
72 SELECT Id, Name, Industry, AnnualRevenue
73 FROM Account
74 WHERE Industry = :industry // Indexed field
75 AND CreatedDate > :createdAfter // Auto-indexed
76 ORDER BY Name
77 LIMIT 2000
78 ];
79 }
80}
81
82// 6. Batch for LDV processing
83public class LargeAccountBatch implements Database.Batchable<SObject> {
84
85 public Database.QueryLocator start(Database.BatchableContext bc) {
86 // QueryLocator can retrieve up to 50 MILLION records
87 return Database.getQueryLocator([
88 SELECT Id, Name, Industry, Last_Review_Date__c
89 FROM Account
90 WHERE Last_Review_Date__c < LAST_N_DAYS:365
91 OR Last_Review_Date__c = null
92 ]);
93 }
94
95 public void execute(Database.BatchableContext bc, List<Account> scope) {
96 for (Account acc : scope) {
97 acc.Last_Review_Date__c = Date.today();
98 acc.Review_Status__c = 'Pending';
99 }
100 Database.update(scope, false); // Partial success
101 }
102
103 public void finish(Database.BatchableContext bc) {
104 System.debug('Batch complete');
105 }
106}

🏋️ Practice Exercise

LDV Practice:

  1. Create a custom object with 100,000+ records and test query performance
  2. Use the Query Plan tool in Developer Console to analyze query selectivity
  3. Implement pagination for a list view that handles 1M+ records
  4. Design a data archival strategy using Big Objects for a 5+ year old data
  5. Identify and fix data skew in your org (Account skew, Ownership skew)
  6. Request custom indexes from Salesforce (or simulate the effect with External IDs)
  7. Write a Batch Apex job that processes 500,000 records in 200-record chunks
  8. Implement a search that works efficiently on objects with 1M+ records
  9. Design a data management strategy for an org that adds 100K records per month
  10. Build a dashboard that monitors record counts, storage usage, and query performance

⚠️ Common Mistakes

  • Using SOQL OFFSET for deep pagination — OFFSET has a 2,000 limit. For deep pagination, use WHERE Id > :lastId ORDER BY Id pattern

  • Not requesting custom indexes on frequently queried fields — without indexes, queries on 1M+ record objects will time out or fail

  • Creating lookup relationships to widely-shared records (data skew) — one Account with 10M Contacts causes lock contention on every Contact update

  • Not planning for data growth — an object that's fine at 100K records may fail at 1M. Design for 10x your current volume

  • Using SOQL for text search on LDV — SOQL LIKE queries don't use indexes with leading wildcards. Use SOSL (search index) instead

💼 Interview Questions

🎤 Mock Interview

Mock interview is powered by AI for Large Data Volume Management. Login to unlock this feature.