The Fuzzy Matching tool can be used to identify non-identical duplicates of a database by specifying parameters to match on. Values need not be exact to find a match, they just need to fall within the user specified or prefabricated parameters set forth in the configuration properties.
The most effective way to configure the Fuzzy Match tool is to assign the match process to multiple fields within the input file. Each field should be individually configured using either a predefined or custom Match Style, configured through the Edit Match Options.
Fuzzy matching only works with Latin character sets, and some of the match capabilities are only compatible with English language.
The input stream of data MUST include a unique identifier for each record. If there is no such key field in the input, add a RecordID tool one step upstream.
Choose the preferred mode to apply the Fuzzy Match tool to. Choices are:
Purge Mode: All records from a single source are compared to identify duplicates.
Merge Mode: Records from different sources are compared, with the intent to identify duplicates across different input files. Each source must contain a Source ID field. A source ID field can be easily appended on Input, by choosing the Output File Name as Field option. This setting will either append, to each record, a field with the File Name or the entire File Path.
Specify the unique Record ID field. This field must be unique to each individual record and be unique across different sources. A record ID can be easily appended to each input via the Record ID tool.
Specify the Match Threshold. Default is 80%. If the Match score generated from the Fuzzy Match tool is less than the specified threshold, the record will not qualify as a match.
The Match Threshold specified here takes into consideration each specification within the configuration properties of the Fuzzy Match tool: Each field, the match style and the resulting match is considered in appropriating the threshold.
Select the Field Name to Match on. Any field already in the input file will be available from this drop down list
Select the Match Style from the drop down list. Choices include:
Address: A predefined match style configured to find Address Matches. This style incorporates Double Metaphone algorithms combined with a digit match to identify matching addresses.
Apply this style to Commercial Addresses
Address No Suite: A predefined match style configured to find Address Matches where the input data has no Suite information in the Address field. This style incorporates Double Metaphone algorithms combined with a digit match to identify matching addresses.
Apply this style to Residential Addresses
AddressPart: A predefined match style configured to find Address Matches. This style incorporates Double Metaphone algorithms combined with a digit match to identify matching addresses. AddressPart differs from a normal Address match style in that it does not use word frequency analysis and the match threshold is 5% lower.
Company Name: A predefined match style configured to find Company Name Matches. This style identifies matches based on Double Metaphone algorithms.
Phone: A predefined match style configured to find Phone Matches. This style looks at the digits only in a phone field and matches on the reverse 10 digits, ignoring dashes, parenthesis and leading 1s that may be contained within the field.
ZIP Code: A predefined match style configured to find ZIP Code Matches. This style looks at the 5 digits of a ZIP field and assigns a match accordingly.
Exact: This field will have to match exactly to be considered a match. This logic is not fuzzy at all.
Name: A predefined match style configured to find Name Matches. This style incorporates Double Metaphone algorithms.
Name with Nicknames: A predefined match style configured to find Name Matches. This style incorporates Double Metaphone algorithms. Additionally this style utilizes a common Nicknames table to check against to further identify duplicates.
For example, the name Andrew may match Andy and/or Drew.
Custom: Allows the user to define their own match parameters, so that the match can be run over and over again without having to reconfigure the match properties. Of course these custom match styles can also be reconfigured and overwritten or new custom styles can be created.
Edit the Match Style as necessary, by clicking the Edit button. The Edit Match Options dialog will display.
Specify additional output fields and settings:
Output Match Score: The match score will be present in an additional output field
Output Generated Keys: Outputs one more field, being the key that got outputted from the resulting match styles.
Output Unmatched Records: Any record that doesn't match any other record will come out as being unmatched - otherwise these records will not come out at all
If you checked âIgnore if emptyâ on any of the âEdit Match Optionsâ dialogs for any match column, then any record with an empty value in that match column will not be output at all, regardless of the setting on this âOutput Unmatched Recordsâ checkbox.
Don't Compare Records already in a Group: Check this box if you will be using a Make Group tool downstream. Doing so will make Alteryx do less work and process the records faster.
Generate Keys Only: All records will be returned with the generated keys only - no matching will take place.
Click Apply to have the configurations accepted.
For information regarding Input, Output, Annotation and Error Properties, see Tool Properties.