I received a notice from my friendly Salesforce rep recently, advising that I had gone over my storage limit:
The last time I had heard from Salesforce on such a matter was when Chatter went wild and took me to 493% of my storage allocation! Oh, you’ll also notice from the picture in that article how much my ‘Contact’ record storage had grown over the past year!
This time, my rep kindly offered to raise an invoice for the additional storage space. I’m cheap at heart, so I decided instead to reduce my storage space. Not that I’m upset at Salesforce — I know it’s expensive to store data in their system because it’s all replicated between data centers, backed-up, etc. However, I knew that a lot of my data was unnecessary, and I could just dump it.
To explain, I populate my Salesforce instance from an external system. I had over 220,000 Contact records, of which only a subset were required. So, I decided to remove Contact records:
- For people who don’t own any of our products (defined in a custom field)
- For records with no Activities
So, I ran Data Loader (actually, the Mac version which is LexiLoader, compliments of Simon Fell, who reminds people to vote for his Idea that Salesforce produce an official Mac version) and extracted a list of contacts who don’t own a product.
I then ran another Data Loader extract to get a list of all Activity records.
Next, took the first list of contacts and subtracted any contacts associated with the Activity records. (I couldn’t figure out how to do this in one SOQL statement, suggestions welcome!)
Finally, I took the list of record IDs and asked the Data Loader to do a bulk delete of the records. It took my storage way down:
I must say, the bulk delete operation was extremely fast, since the Data Loader uses the Bulk API for such operations.
The ‘Oops!’ moment
Things seemed fine until a couple of days later when my users reported that they had records with Activities that had been deleted. I went back and checked my work, only to discover that I made an error in my “subtraction” step. Instead of taking all contacts and removed all IDs that matched a list of contacts that had Activities, I subtracted the list of Activities themselves. Since these objects have non-overlapping Ids (that is, no Activity IDs matched any Contact IDs), that operation did nothing.
End result: I deleted a lot of useful records. Gulp!
I did some searching and found rumors that Salesforce could undelete records, but charge a lot of money for the privilege. Not great, since it would cost more than I had originally tried to save!
Next, I investigated the Recycle Bin. Here’s what the official documentation says:
The Recycle Bin link in the sidebar lets you view and restore recently deleted records for 30 days before they are permanently deleted. Your recycle bin record limit is 250 times the Megabytes (MBs) in your storage. For example, if your organization has 1 GB of storage then your limit is 250 times 1000 MB or 250,000 records. If your organization reaches its Recycle Bin limit, Salesforce automatically removes the oldest records if they have been in the Recycle Bin for at least two hours.
My limit actually is 1GB (because we only have a small number of users, so we get the minimum size). Therefore, I get 250,000 records. Given that I deleted about 220,000 records, it means they’re all still in there!
I started to use the Recycle Bin ‘undelete’ function, but doing 200 at a time means I’d need to do it 1000 times!
So, I next tried some Apex in the System Log window, like this:
Contact[] c = [select id from contact where isDeleted = true LIMIT 1000 ALL ROWS]; undelete c;
However, some records didn’t want to undelete because our external system had already Upserted replacements and undeleting some records would have caused a clash of unique fields. And if this happened, the whole undelete was rolled-back rather than allowing through the non-clashing records. Argh! So, I then went to something a bit more sophisticated:
// Get a list of Contact records to delete Contact[] contacts = [select id, EmailAddr__c from contact where isDeleted = true limit 1000 ALL ROWS ]; // Put the Email addresses into an array String[] emails = new String[]{}; for (Contact c : contacts) { emails.add(c.EmailAddr__c); } // Get a list of 'alive' Contacts (not deleted) that already use that email address Contact[] alive = [select id, EmailAddr__c from contact where EmailAddr__c in :emails]; system.debug('Found: ' + alive.size()); // Make a list of Contacts to delete if (alive.size() != 0) { for (Contact c : alive) { for (Integer i = 0; i < contacts.size(); ++i) { if (contacts[i].EmailAddr__c == c.EmailAddr__c) { contacts.remove(i); break; } } } system.debug('Will undelete: ' + contacts.size()); // Delete them! undelete contacts; }
I should explain the EmailAddr__c thing. You see, Email is my external ID. However, I couldn’t use the standard Email field as an External ID because I can’t force it to be unique. So, I have a second field for Email address and I populate the both. For more details, see my earlier blog post.
Anyway, the above code took about 2 minutes for 1000 records:
10:11:19.031 (31752000)|EXECUTION_STARTED 10:11:19.031 (31788000)|CODE_UNIT_STARTED|[EXTERNAL]|execute_anonymous_apex 10:11:19.032 (32365000)|SOQL_EXECUTE_BEGIN|[1]|Aggregations:0|select ... 10:11:19.074 (74698000)|SOQL_EXECUTE_END|[1]|Rows:1000 10:11:19.202 (202887000)|SOQL_EXECUTE_BEGIN|[6]|Aggregations:0|select ... 10:13:07.266 (108266842000)|SOQL_EXECUTE_END|[6]|Rows:157 10:13:07.267 (108267315000)|USER_DEBUG|[7]|DEBUG|Found: 157 10:13:15.949 (116949306000)|USER_DEBUG|[19]|DEBUG|Will delete: 896 10:13:15.950 (116950156000)|DML_BEGIN|[20]|Op:Undelete|Type:Contact|Rows:896 10:13:19.937 (120937987000)|DML_END|[20]
Most of the time taken was for the 2nd SOQL query (106 seconds), which matches on email. The loop to eliminate duplicates also took time (8 seconds). The undelete itself was relatively quick (4 seconds).
So, I included an ORDER BY clause in my initial query that tried older records first. This resulted in less email clashes, and much faster execution times.
Over the course of a day, I managed to undelete all the records. In fact, it sped up a lot after midnight San Francisco time (which is easy for me because I’m in Australia). Finally, I did my mass delete properly and everybody was happy.
The result:
How to avoid this error in future
Okay, I was doing dangerous stuff and I did it wrong. So how could I avoid this in future? Some ideas:
- Make a backup first! Extract all data first (but that’s not easy!) or use the “Export Data” function (but that’s not easy to reload).
- Try it in the Sandbox first. However, we have a Cofiguration-only Sandbox, without all the data. No good.
- Test before committing the delete. I did pick random records, but obviously not enough.
- Get somebody else to review my work before deleting.
The last idea reminds me of a quote in Kernighan’s famous book The Practice of Programming:
Another effective technique is to explain your code to someone else. This will often cause you to explain the bug to yourself. Sometimes it takes no more than a few sentences, followed by an embarrassed “Never mind, I see what’s wrong. Sorry to bother you.” This works remarkably well; you can even use non-programmers as listeners. One university computer center kept a teddy bear near the help desk. Students with mysterious bugs were required to explain them to the bear before they could speak to a human counselor.
I used that technique a lot at work. I ask somebody to “be my teddy bear”, tell them my problem, suddenly realize the solution, then thank them for their help even though they said nothing. Works every time!
Irony
Oh, here’s some irony. No sooner did I do the above, then I receive an email from Salesforce telling me that Recycle Bin limits are being cut:
At salesforce.com, Trust is our top priority, and it is our goal to improve the performance of our Recycle Bin functionality. With that in mind, we are making some changes to the Recycle Bin limits to provide you with a faster user experience.
What is the change and how does it impact me?
We are lowering the Recycle Bin retention period from 30 days to 15 days. The Recycle Bin link in the sidebar will now let you restore recently deleted records for 15 days before they are permanently deleted.
Additionally, we are lowering the Recycle Bin record limit from 250 times your storage to 25 times your storage. For example, if your organization has 1 GB of storage then your limit is 25 times 1000 MB or 25,000 records. If your organization reaches its Recycle Bin limit, Salesforce will automatically remove the oldest records if they have been in the Recycle Bin for at least two hours.
When is this change taking effect?
The lower Recycle Bin retention period will go into effect with the Winter ’12 Release.
The irony is that, had these reduced limits been in place, I would not have been able to recover my deleted data. Phew!
The Bottom Line
- Test or verify before committing large data-related changes
- You can’t do undelete via the Bulk API
- The recycle bin is very big!
- I’m cheap at heart