Assigning Scores, Transformations, and Thresholds (Oracle Trading Community Architecture)

Assigning Scores, Transformations, and Thresholds

After you have selected attributes for the match rule and defined their usage, assign transformations for each attribute. For match rules with Bulk Duplicate Identification or Expanded Duplicate Identification purpose, you also define the scores and weights used to calculate the match score for each record in the work unit. The match score for the entire record is the sum of the actual weighted attribute scores. This match score is the value that is compared to the match rule thresholds to evaluate the record in the work unit.

Procedure for Match Rules with Search Purpose

To specify the order in which attributes appear as search criteria, assign a number to every attribute. Use positive integers greater than 0. The numbering does not have to be gapless.

For example, if you have four attributes, the first row of displayed search criteria contains attribute 1 and 2, from left to right. The second row contains attribute 3 and 4, from left to right. All attributes without an assigned number are displayed last.
Assign at least one transformation for each attribute. You can choose more than one transformation for each of the attributes in the match rule.

Suggestion: Use the fewest transformations possible in your match rule. Using more transformations than necessary could affect the time required for staging and the performance of your search.

Transformations that appear by default were selected to do so, in the specified order, on the Define Attributes and Transformations page. See: Defining Attributes and Transformations.
Use Up and Down to order the transformations. For example, the CLEANSE transformation alters the original attribute value more than EXACT does. You would order EXACT before CLEANSE because the transformed value is closer to the original and provides a more precise match.

This order determines how the search is processed. See: Search Matching Process.
You can save the match rule definition and compile it later. A new or updated match rule cannot be used until it is compiled. See: Compiling Match Rules.

Procedure for Match Rules with Bulk Duplicate Identification or Expanded Duplicate Identification Purpose

This table describes some terms in the pages used for this procedure.

Selected Terminology

Term	Description
Score	A numeric value awarded to a record if the attribute is evaluated as a match. If the attribute does not match, then a score of zero is assigned. This score is then multiplied by the weight percentage of the attribute, if any, to determine the final weighted attribute score that counts toward the match score of the record.
Weight	A percentage used to determine a weighted score. If a transformed attribute value is a match, the weight is multiplied by the attribute score to determine the weighted score of the attribute. If an attribute is assigned more than one transformation, the highest weighted score is awarded to the record for the attribute. For example, you assign the Party Type scoring attribute a score of 50, and assign the transformations Exact and Cleanse to that attribute. You give Exact a 80% weight and Cleanse 50%. If the Party Type attribute is a match with both transformations, the attribute's weighted score would be 40.
Adjusted Score	The attribute score multiplied by the weight percentage for the attribute and transformation combination.
Match Threshold	A threshold that must be met for records to be considered a match. The Match Threshold for search rules is expressed as a percentage.
Automerge Threshold	A threshold that must be reached for Automerge. A record with a score equal to or above the Automerge threshold is marked by default as a candidate for merge without manual intervention. The record will be automatically merged if Automerge is implemented.
Similarity	An algorithm that compares the transformed attribute value of the input record to the corresponding attribute value from the work unit record and assigns a percentage for the extent of similarity. This similarity percentage is the edit distance between two strings, or groups of text, computed as follows: Determine the edit distance, or the number of changes required to make the longer string match the shorter string. For example, for Smythe and Smith, the edit distance is two. Subtract the edit distance from the number of characters in the longest string. Following the example above: 6 - 2 = 4. Divide the amount calculated in step 2 by the number of characters in the longest string. Continuing the example: 4/6 = 0.6666. Express the result as an integer. In this example the result would be a similarity score of 67. If two strings are identical, then the similarity percentage equals 100. If no characters in the two strings are the same, then the similarity percentage is zero.

Rank your scoring attributes in order of importance by assigning scores in the form of integers. Assign the highest score to the attribute that you consider the most important for a match.
Assign at least one transformation for each acquisition, filter, and scoring attribute. DQM applies the selected transformations to that attribute before the input record is compared to the record in the work unit. You can choose more than one transformation for each of the attributes in the match rule.

Suggestion: Use the fewest transformations possible in your match rule. Using more transformations than necessary could affect the time required for staging and the performance of your search.

If the match rule has the Bulk Duplicate Identification purpose, then only transformations marked for Bulk Acquisition on that page are available for the corresponding attribute. See: Defining Attributes and Transformations.
For scoring attributes, optionally assign weights and, available through personalization, similarity matching. You also specify if the attribute and transformation combination is used in acquisition or scoring processes when the match rule runs.
- Weight: Assign percentage weights to the transformations depending on how similar the transformed value of the attribute would be to the original values of the attribute. For example, you should assign more weight to the Exact transformation than to the Cleanse transformation because Exact makes fewer changes to the original data.
- Similarity: The Similarity matching option does not require an exact match, letting you create fuzzier matches by applying the similarity algorithm to transformed attribute values. The similarity algorithm compensates for unanticipated errors that the transformations do not catch.
  
  If the computed percentage is greater than or equal to the similarity percentage that you define in the match rule, the attribute is considered a match. If you select the Similarity option, you must enter this similarity percentage.
  
  Note: The Similarity option requires additional computing resources and time.
Define match rule thresholds.
Note: Make sure that:
- You do not set any thresholds too low. Low thresholds might let combinations of attributes pass as matches that are not significant.
- Each threshold is less than the sum of the possible scores of all attributes.
- Match Threshold: To compute what you should enter, determine the minimum set of attributes required for a match. The total of the attribute scores of this minimum set is the maximum value of the match threshold.
- Automerge Threshold: You can enter this threshold only if the match rule is allowed for Automerge. See: Defining Single Match Rules.
  
  To compute the Automerge threshold, determine the minimum set of attributes required for considering two parties for merge. The total of the attribute scores of this minimum set is the maximum value for the Automerge threshold.
  
  The automatic merge threshold must be more than or equal to the match threshold.
  
  Caution: You cannot unmerge records that are automatically merged. Set the automatic merge threshold high enough to prevent merging records that are not definite duplicates.
You can save the match rule definition and compile it later. A new or updated match rule cannot be used until it is compiled. See: Compiling Match Rules.