Defining Duplicate Identification Batches (AR)

Defining Duplicate Identification Batches

Use the Submit Duplicate Identification Batch window to define and submit the batch of subset entries that you want to find duplicates for. When you submit the batch, the DQM Duplicate Identification program automatically applies the match rule from this window and scores potential duplicates.

If you do not define a subset, the DQM Duplicate Identification program compares all records in the staged schema against one another. This process can take a long time, depending on the detail of your match rule and the size of your staged schema.

Define a subset of records to compare against the rest of the staged schema or against one another, for two reasons:

To save time
If you are familiar with the contents of the TCA Registry, for example, if you know about a new influx of data in a specific date range or records that were created by a particular application or individual

You can select up to ten conditions to define the subset, using any of the attributes from the HZ_PARTIES table. You can also manually enter SQL statements to define the subset.

After the DQM Duplicate Identification program finishes, the results are displayed in the Duplicate Identification: Batch Review window.

To define and submit a duplicate identification batch:

Navigate to the Submit Duplicate Identification Batch window.
Enter a name for the duplicate identification batch in the Batch Name field.
Select a match rule from the list of values to use for identifying and scoring duplicates in the Match Rule field. The match rule defaults from the DQM Match Rule for Batch Duplicate Identification profile option, if defined.

Even if the selected match rule is allowed for Automerge, the Automerge feature is not integrated with batch duplicate identification.

Note: Use a match rule with the Bulk Duplicate Identification type if you want to identify only records that are almost exact duplicates. Match rules with Simple Duplicate Identification type provide fuzzier matches.
In the Number of Workers field, enter the number of parallel workers that you want to use to improve performance.

Workers are processes that run at the same time to complete a task that would otherwise take longer with a single process. The default number of workers is 1, and you cannot use more than ten workers.
Uncheck the Match within Subset check box if you want to compare the subset against the entire staged schema for duplicates.

By default, the records in the subset are only compared against one another.
Check the Find Merged Parties check box if you want to include parties that were previously merged in the search.
Navigate to the Define Subset region.
In the Attribute fields, select attributes from the list of values that you want to define the subset with.
For each attribute, select a condition:
- >
- <
- =
- CONTAINS
  
  Note: The CONTAINS condition applies only to a word surrounded by white space.
In the Value fields, enter a value for each attribute and condition.

For example, if you enter 1001 for the Party Number attribute with a less than condition, the subset includes only parties with a number of 1000 or lower.
In the SQL Clause text box, you can manually add to the corresponding SQL clauses that are automatically generated when you define subset conditions. Alternatively, you can enter SQL statements instead of selecting attributes and conditions in the previous fields.
Press the Submit Batch button.

The DQM Duplicate Identification program runs to identify duplicates for the subset of records that you defined, using the match rule that you specified.