Identifying duplicates in a transaction file - The simplest approach
In this post we share a short video on the steps to identify duplicates in a file.
There are many reasons for wanting to identify duplicates in a transaction file. It could be that the transaction file is a dataset of payments made to vendors, in which case identifying the duplicates can help your business recoup costs. It could be that the duplicates in the file are multiple entries of patient data. It could be that the duplicates are repetitive records of security logs. What ever the reason, when performing data analytics there is good reason to be able to effectively manage duplicates.
There is a duplicates function in Excel, but it works on the opposite principle to the approach used by Arbutus Analyzer. In Excel, running the duplicates function removes the duplicate transactions from the dataset - with no transaction log. In Arbutus Analyzer the duplicates function identifies the duplicate transactions and reports those transactions for you.
Let's see how it is done with this short video:
The steps shown in the video are:
Import your file
Select the "Analyze" menu
Select the "Duplicates" menu item
Select the fields you want to test
Click "OK"
Review the results in the "Command Log".
Click on the hyper links in the command to see the selected transactions
So simple and so fast.