Fuzzy Match Tool

The Fuzzy Matching tool can be used to identify non-identical duplicates of a dataset by specifying match fields and similarity thresholds. Match Scores only need to fall within the user-specified or default thresholds established in the configuration properties.

The most effective way to build a fuzzy match is to perform the match process on multiple fields within the input file. Each field should be individually configured using either a predefined or custom Match Style, configured through the Fuzzy Match Edit Match Options .

Fuzzy matching only works with Latin character sets, and some of the match capabilities are only compatible with English language.

Configuration Properties

A unique identifier for each data record is necessary for the Fuzzy Match tool to work. Inspect your data; if there is no such key field, add a Record ID Tool one step upstream.

  1. Choose the preferred match mode:

  2. Specify the unique Record ID field.

  3. Specify the Match Threshold. The default value is 80%. If the Match score generated from the Fuzzy Match tool is less than the specified threshold, the record will not qualify as a match.

    The Match score takes into consideration each specification within the configuration properties of the Fuzzy Match tool: Each field, the match style, the match weight, and the resulting field match score is considered in calculating the score, which is then against the specified Match Threshold.

  4. Select the Field Name to match on. Any field already in the input connection will be available from this drop down list.

  5. Select the Match Style from the drop down list. Choices include:

  6. Edit the Match Style as necessary by clicking the Edit button. The Fuzzy Match Edit Match Options dialog will display.

  7. Specify Advanced Options:

For additional information regarding Fuzzy Match use, see the Fuzzy Match FAQ.