Assigning Scores, Transformations, and Thresholds

After you have selected attributes for the match rule and defined their usage, assign transformations for each attribute. For match rules with Bulk Duplicate Identification or Expanded Duplicate Identification purpose, you also define the scores and weights used to calculate the match score for each record in the work unit. The match score for the entire record is the sum of the actual weighted attribute scores. This match score is the value that is compared to the match rule thresholds to evaluate the record in the work unit.

Procedure for Match Rules with Search Purpose

  1. To specify the order in which attributes appear as search criteria, assign a number to every attribute. Use positive integers greater than 0. The numbering does not have to be gapless.

    For example, if you have four attributes, the first row of displayed search criteria contains attribute 1 and 2, from left to right. The second row contains attribute 3 and 4, from left to right. All attributes without an assigned number are displayed last.

  2. Assign at least one transformation for each attribute. You can choose more than one transformation for each of the attributes in the match rule.

    Suggestion: Use the fewest transformations possible in your match rule. Using more transformations than necessary could affect the time required for staging and the performance of your search.

    Transformations that appear by default were selected to do so, in the specified order, on the Define Attributes and Transformations page. See: Defining Attributes and Transformations.

  3. Use Up and Down to order the transformations. For example, the CLEANSE transformation alters the original attribute value more than EXACT does. You would order EXACT before CLEANSE because the transformed value is closer to the original and provides a more precise match.

    This order determines how the search is processed. See: Search Matching Process.

  4. You can save the match rule definition and compile it later. A new or updated match rule cannot be used until it is compiled. See: Compiling Match Rules.

Procedure for Match Rules with Bulk Duplicate Identification or Expanded Duplicate Identification Purpose

This table describes some terms in the pages used for this procedure.

Selected Terminology

Term Description
Score A numeric value awarded to a record if the attribute is evaluated as a match. If the attribute does not match, then a score of zero is assigned.
This score is then multiplied by the weight percentage of the attribute, if any, to determine the final weighted attribute score that counts toward the match score of the record.
Weight A percentage used to determine a weighted score. If a transformed attribute value is a match, the weight is multiplied by the attribute score to determine the weighted score of the attribute. If an attribute is assigned more than one transformation, the highest weighted score is awarded to the record for the attribute.
For example, you assign the Party Type scoring attribute a score of 50, and assign the transformations Exact and Cleanse to that attribute. You give Exact a 80% weight and Cleanse 50%. If the Party Type attribute is a match with both transformations, the attribute's weighted score would be 40.
Adjusted Score The attribute score multiplied by the weight percentage for the attribute and transformation combination.
Match Threshold A threshold that must be met for records to be considered a match. The Match Threshold for search rules is expressed as a percentage.
Automerge Threshold A threshold that must be reached for Automerge. A record with a score equal to or above the Automerge threshold is marked by default as a candidate for merge without manual intervention. The record will be automatically merged if Automerge is implemented.
Similarity An algorithm that compares the transformed attribute value of the input record to the corresponding attribute value from the work unit record and assigns a percentage for the extent of similarity. This similarity percentage is the edit distance between two strings, or groups of text, computed as follows:
  1. Determine the edit distance, or the number of changes required to make the longer string match the shorter string.

    For example, for Smythe and Smith, the edit distance is two.

  2. Subtract the edit distance from the number of characters in the longest string.

    Following the example above: 6 - 2 = 4.

  3. Divide the amount calculated in step 2 by the number of characters in the longest string.

    Continuing the example: 4/6 = 0.6666.

  4. Express the result as an integer.

    In this example the result would be a similarity score of 67.


If two strings are identical, then the similarity percentage equals 100. If no characters in the two strings are the same, then the similarity percentage is zero.
  1. Rank your scoring attributes in order of importance by assigning scores in the form of integers. Assign the highest score to the attribute that you consider the most important for a match.

  2. Assign at least one transformation for each acquisition, filter, and scoring attribute. DQM applies the selected transformations to that attribute before the input record is compared to the record in the work unit. You can choose more than one transformation for each of the attributes in the match rule.

    Suggestion: Use the fewest transformations possible in your match rule. Using more transformations than necessary could affect the time required for staging and the performance of your search.

    If the match rule has the Bulk Duplicate Identification purpose, then only transformations marked for Bulk Acquisition on that page are available for the corresponding attribute. See: Defining Attributes and Transformations.

  3. For scoring attributes, optionally assign weights and, available through personalization, similarity matching. You also specify if the attribute and transformation combination is used in acquisition or scoring processes when the match rule runs.

  4. Define match rule thresholds.

    Note: Make sure that:

    • You do not set any thresholds too low. Low thresholds might let combinations of attributes pass as matches that are not significant.

    • Each threshold is less than the sum of the possible scores of all attributes.

  5. You can save the match rule definition and compile it later. A new or updated match rule cannot be used until it is compiled. See: Compiling Match Rules.

Related Topics