Use the Edit button of the Fuzzy Match Tool Configuration window to access the Edit Match Options.
The Match Style is a predetermined method of finding an appropriate match between records of an input file. The individual match style choices are defined on the Fuzzy Match Tool page.
Any predefined or custom, user-defined match styles will appear in this list. The subsequent specifications in the dialog box will be selected based on the match style chosen.
If you edit a predefined match style, it will change to "Custom" in the drop down list. The settings specified in this custom match style will save with the workflow.
Add new custom match styles rather than deleting or editing default options.
You can delete a match style by selecting it from the drop down and clicking Delete. You can add a match style by typing in a new name and clicking OK.
The Preprocess describes a procedure that runs before Generate Keys and the Fuzzy Match function. The Preprocess should result in better matches. The choices from this list include:
The Preprocess can be user-defined by editing the FuzzyMatchStyles.xml. This file is located in the Alteryx Runtime directory: \Program Files\Alteryx\bin\RuntimeData\FuzzyMatch. This file should only be edited by a user who is familiar with XML and Regular Expressions.
Generate Keys is the method by which a potential match will be identified. Alteryx reads through the specified field and assigns Keys to the components of that field. Once all keys are generated, Alteryx compares the concatenated keys for every match field. If the keys generated are equal for two records, a potential match is identified and the pair will proceed to the next phase of the match process. Function choices are:
1-(303)440-8896 would not match 303-440-8896.
Even though non-digit characters are ignored, these phone numbers still do not match because there is a leading 1 in the first record.
1-(303)440-8896 would match 303-440-8896.
Non-digit characters are ignored and numbers are matched from last (6) to first (3 or 1). For this record to match, specify that the Maximum Key Length = 10 to ignore the leading 1.
1234 5th St.
The "1234" would be the key.
Alteryx automatically replaces the following leading letters and letter combinations prior to generating the match key:
Leading letter(s) | Replacement |
---|---|
AV | AF |
AH | A |
AW | A |
CAAN | TAAN |
DG | G |
D | G |
HA | A |
KN | K |
K | C |
MAC | MC |
M | N |
NST | NS |
PF | F |
PH | F |
Q | G |
SCH | SH |
Z | S |
The algorithm was devised to code names recorded in US census records. The standard algorithm works best on European names. Variants have been devised for names from other cultures. For more information, see Soundex.
Double Metaphone is the preferred algorithm.
Generate Keys for each word: Generates a separate key for each word.
"john smith" and "smith john" will be able to line up as a potential match even though words are out of order.
Ignore if Empty: Ignores an empty value of the specified match field. If the fieldis empty, then no key will be generated and record will be thrown out.
Maximum Key Length: Specify the maximum length of the key to consider for the match.
The Match function is a more granular process by which a match is identified, and a score is applied. This differs from keys, which must match exactly. Choices are:
Match "Albert Commette" to "Albert Commette MD."
The Word Frequency Statistics table for "Name" includes the word "MD." When Word Frequency: Name is specified, the resulting match score is roughly 5 points higher than if Word Frequency: Name is not specified.
Word Frequency Statistics are contained within Alteryx Database files *yxdb and can be located in the RunTime Data Directory:
\Program Files\Alteryx\bin\RuntimeData\FuzzyMatch\
You can also create your own Word Frequency Statistics by editing the workflow CollectStats.yxmd located in the same directory.
Add additional nicknames and abbreviations:
Match Threshold: Set the allowable uncertainty percentage to return a match for a particular field.
If the threshold for field 1 is 60% and the field only matches with 55% confidence, the record will be thrown out.
Match Weight: Apply importance to the field, causing the field to be considered more or less strongly during a match.
If "Company Name" is twice as important as "Contact Name," you can set the importance here. So the Match Weight for Company Name should be twice the value of the Match Weight for Contact Name. This weight will be used when calculating the overall Match Score.
For additional information regarding Fuzzy Match use, see the Fuzzy Match FAQ.
©2018 Alteryx, Inc., all rights reserved. Allocate®, Alteryx®, Guzzler®, and Solocast® are registered trademarks of Alteryx, Inc.