Fuzzy Match Tool
Docs are available before the release of Designer Cloud so you can get a sneak peek. This content might change between now and the official release.
One Tool Example
Fuzzy Match has a One Tool Example. Go to Sample Workflows to learn how to access this and many other examples directly in Alteryx Designer Cloud.
Use Fuzzy Match to identify non-identical duplicates in a dataset by specifying match columns and similarity thresholds. Match scores only need to fall within the user-specified or default thresholds established in the configuration properties.
The most effective way to create a fuzzy match is to perform the match process on multiple columns within the input file. Each column should be individually configured using either a predefined or custom Match Style, configured through the Fuzzy Match Edit Match Options.
Fuzzy matching only works with Latin character sets, and some of the match capabilities are only compatible with the English language.
Configure the Tool
A unique identifier for each data row is necessary for the Fuzzy Match tool to work. Inspect your data—if there is no key column, add a Record ID tool 1 step upstream.
- Choose the preferred Find Method:
- Find Values to Purge: All rows from a single source are compared to identify duplicates.
- Find Values to Merge: Rows from different sources are compared, with the intent to identify duplicates across different input files. When you use Find Values to Merge, also select the Column with a Source ID.
- Set up your Match Criteria:
- Basic Fuzzy Match: Supports 1 column and default match styles.
- Advanced Fuzzy Match: Supports multiple column matches and custom match styles. The incoming data has to have a unique row ID to use this method. This method can be used for data rehydration.
- Select the Column Header to match on. Any column already in the input connection will be available from this dropdown list.
- Select the Match Style from the dropdown list. Choices include:
- Address: A predefined match style configured to find address matches. This style incorporates Double Metaphone algorithms combined with a digit match to identify matching addresses. Apply this style to commercial addresses.
- Address No Suite: A predefined match style configured to find address matches where the input data has no suite information in the
Addresscolumn. This style incorporates Double Metaphone algorithms combined with a digit match to identify matching addresses. Apply this style to residential addresses.
- AddressPart: A predefined match style configured to find address matches. This style incorporates Double Metaphone algorithms combined with a digit match to identify matching addresses. AddressPart differs from a traditional address match style in that it does not use word frequency analysis and the match threshold is 5% lower.
- Company Name: A predefined match style configured to find company name matches. This style identifies matches based on Double Metaphone algorithms.
- Exact: This field has to match exactly. The logic isn't fuzzy at all.
- Name: A predefined match style configured to find name matches. This style incorporates Double Metaphone algorithms.
- Name w/ Nicknames: A predefined match style configured to find name matches. This style incorporates Double Metaphone algorithms. Additionally, this style utilizes a Nicknames table to check against to further identify duplicates. For example, the name Andrew may match Andy and/or Drew.
- Phone: A predefined match style configured to find phone matches. This style looks at the digits only in a phone field and matches on the reverse 10 digits, ignoring dashes, parenthesis, and leading 1s that may be contained within the field.
- ZIP Code: A predefined match style configured to find ZIP code matches. This style looks at the 5 digits of a ZIP field and assigns a match accordingly.
- Custom: Allows the user to define their own match parameters, so that the match can run repeatedly without needing reconfiguration. Of course, you can reconfigure and overwrite these custom match styles, or you can create new custom styles.
- Use Edit to edit the Match Style as necessary. This opens the Fuzzy Match Edit Match Options configuration window.
- Specify the Output Options:
- Match Score: The Match Score appears in an additional output column. The Match Score is a percentage that is generated for matched columns.
- Generated Keys: Output the key from the resulting match styles as an additional column.
- Unmatched Rows: Rows that don't match any others are output as additional rows. Occasionally, the output unmatched rows report a match score, which should be ignored. That might be addressed in a future release. The Ignore If Empty checkbox from the Fuzzy Match Edit Match Options is prioritized over this option.
- Specify the Advanced Options:
- Don't Compare Rows Already in a Group: Rows that have been matched aren't compared to other rows, reducing computational expense and time. For example, if row 1 matches to row 2 and row 3, row 2 is not matched against row 3. Use a Make Group tool downstream to link groups together.
- Generated Keys Only: All rows are returned with the generated keys as an additional column. No matching takes place.
For additional information regarding Fuzzy Match use, visit the Fuzzy Match FAQ.