Creating and Updating Word Replacement Lists

Although DQM provides over 3,000 word replacement pairs in American English, you can modify the provided lists or create lists of word replacement pairs with words that users often enter with errors or as shortcuts.

The fuzzy key generation program uses only the following 3 word lists for fuzzy search

  1. ADDRESS_DICTIONARY - for address

  2. ORGANIZATION_NAME_DICTIONARY - for organization

  3. PERSON_NAME_DICTIONARY - for person

You cannot create your own replacement list, but must update any of the applicable lists listed above, for fuzzy search.

Attention: A new word list is not used until you create custom transformations that use the list. See: Creating Custom Transformations.

Procedure to Select the Word Identification Method

When you create or copy a word list, you must specify the word identification method. You cannot change the method when you update a list.

Note: The Nondelimited method is usually used for relevant non-English languages, such as Japanese, that are based on characters, not words separated by spaces.

Example

John is the original word, Jonathan is the replacement word, and the attribute value is John Johnson. If the word replacement with the Delimited method is applied, then the attribute value becomes Jonathan Johnson, because only John surrounded by spaces is replaced. If with Nondelimited, then the value becomes Jonathan Jonathanson, because John is replaced no matter where it appears.

Procedure to Enter Word List Information

This table describes some terms in the pages used for this procedure.

Selected Terminology

Term Description
Condition Criterion that must be met for the word replacement to occur. Conditions are particularly useful for country-specific word replacements. For example, in the UK, LTD or Limited is a common organization name suffix. You can specify to replace either word with a blank space only if it appears at the end of a string.
  1. Enter a unique word list name, and optionally define the source of the list, for example to identify a list that you created or obtained from a third party. When you update an existing list, you can change the name and source, but not the language.

  2. Define word replacement pairs.

  3. For any word pair, optionally enter a condition.

    You can use the same original word multiple times in a list only if the replacement words and conditions are different. For example, you can enter St. twice as an original word to be replaced by the replacement words Street and Saint, with a condition for each case.

    Attention: If you use original words multiple times, the conditions are applied in the order defined, and a word is replaced according to the first condition that is met. For example, if the St. and Street word pair is defined first, and that condition is met, then the word replacement occurs. The condition for the St. and Saint word pair is skipped.

  4. You must enter a value after the condition if the field is not disabled. If multiple values are possible, for example for the seeded If Country Equals condition, separate each value by a comma.

  5. After you add or modify word replacement pairs, run the DQM Staging program to update the staged schema to include the new or revised word replacement pairs. In the Original Word column, Staging Required indicates the word pairs that still need to be staged.

    For any record that you add to or update in the TCA Registry, the word replacement pairs become immediately effective after the DQM Staging program finishes. See: DQM Staging Program.

Related Topics