Creating System Duplicate Identification Batches

You can create System Duplicate Identification batches that contain sets of system-identified duplicates. De-duplication generates SDI batches based on:

When you define and submit a SDI batch, the DQM Duplicate Identification program sequentially generates duplicate sets for the batch, using your match rule and conditions. A unique batch number is automatically appended to the batch name when you submit the batch. The batch appears in the System Duplicate Identification Batches page as soon as you submit the batch, even if the program has not yet finished running.

Automerge in System Duplicate Identification

You can optionally run Automerge as part of creating the batch. Automerge lets you merge parties that are definite duplicates without going through the process of creating, mapping, and submitting merge requests. See: Automerge.

For the batch, you select a match rule that is enabled for Automerge, and then select Run Automerge. Automerge is not run if you do not select both. You should be familiar with how your selected match rule works, and especially aware of the automatic merge threshold.

Caution: You cannot undo automatic merges. For Automerge, use only match rules that provide exact matches.

After the duplicate sets are identified for the batch, Automerge automatically merges the sets in which all parties have match scores that exceed the automatic merge threshold defined in your selected match rule. Merge requests are automatically created with these duplicate sets. You can view and track these Automerge merge requests in the Merge Request Queue, as well as resubmit failed Automerge processes. See: Merge Requests Overview.

If any party in an Automerge merge request already exists in another merge request, the request would result in error. Even if one request fails, the other requests from the same SDI batch would continue to process accordingly.

If the System Duplicate Identification batch includes parties that do not reach the automatic merge threshold, a new batch is created to include those duplicate sets.

Note: When you create a batch with Automerge selected, the batch name is automatically prefixed with AM. If a new batch is created for the duplicate sets that were not merged, that batch has the same batch name, only without the AM.

Defining Subset for Identifying Duplicates

Define conditions that a record must meet to be an input record used to identify duplicates within your entire system. For example, you can specify that the Registry ID must be less than 1001. All parties with such a Registry ID are compared to one another, as well as to all other parties in your system, to determine duplicates for the System Duplicate Identification batch.

If you select to match only within the subset, then the records that meet the criteria are compared only to one another, not to all records. From the above example, the SDI batch would contain only parties with a Registry ID less than 1001.

Scheduling Batch Creation

You can create the System Duplicate Identification batch as soon as possible or at a specific date and time. Optionally also schedule the repeat this batch creation at a regular interval. For example, you can specify to repeat every 20 days, from the first run (whether that is as soon as possible or at a specific date and time) until your specified end date.

Related Topics