Removing-duplicates-by-Zdravko-Verguilov

Removing duplicates: a performance tweak

by Zdravko Verguilov, ServiceNow Platform Developer, Do IT Wise

It’s the age-old nightmare of the novice programmer: here’s an array, remove the duplicates. 

Removing the duplicates

Let’s imagine the following scenario. You have a huge project in ServiceNow ahead. It involves importing millions of records. It is getting heavier and heavier by the hundreds of Business Rules supporting the Transform Maps, calling Script Includes left, right and center, etc. Performance is a rare commodity in an instance like this. When you’re waiting for a data source, seemingly containing a lot of records, to finish importing, every second is valuable. 

So, for this tedious task at first glance, we can give you an interesting solution: 

var arr = [1, 2, 3...]

var arr2 = [ ];

for (var i in arr) {

    if(arr2.indexOf(arr[i]) == -1) {

        arr2.push(arr[i])

    }

}

Although it may not be the most appealing one, we are looking from the performance point of view. 

So, is it fast?

To answer this question, we will see one simple example from the practice. For an array of about 35 000 small record pieces in a Background Script, the timer said 00:00:00.271. At this point that doesn’t tell us much.  

So, we need a better way to do this. How can I get the job done without the .indexOf going back and forth on every iteration? Besides that, it also gets a bit slower every time the result array grows. The obvious answer would be a javascript Set. But that is not the best option when it goes to working in the ServiceNow platform. So, what’s the next best thing? How about a simple object? 

The solution is very clean, and allows extending the logic further easily and without additional iterations: 

var arr = [1, 2, 3...];

var obj = {};

for (var i in arr) {

    obj[arr[i]] = true;

}

When we pass the values from the array as keys in the new object, we don’t need to check if they already exist. If a duplicate is passed, it would simply overwrite the already existing one, as object keys are unique by default. We can also store some additional information in the value, to have it ready and easily accessible for further use in our logic. For example, the values can be a number, incrementing every time a duplicate of a particular key is passed, allowing us to count the duplicates of every kind and also showing us the entries that weren’t duplicated at all. 

That’s all nice, but the main reason I tried a different approach, to begin with, was to save some precious seconds. So, I tried processing the same big array with the ‘key: value’ approach. The result was 00:00:00.101. These times may seem well within the margin of error, which is why I used a vast array and repeated the tests several times, getting pretty much the same result every time.

These are small strings that we’re comparing in the test run, and these times may seem negligible, but it’s a whole different picture when dealing with real data. The solution is considerably faster, it will save some valuable time if large chunks of data are involved, and it’s a cleaner and more flexible alternative. 

Do you know some other solutions that might be useful in such a case? Let us know in the discussion. 
And if you enjoyed this article, share the knowledge with your colleagues. 

For more articles about ServiceNow tips and tricks, visit our blog

Start typing and press Enter to search

Shopping Cart