EXAMPLE - Double Metaphone Functions
This example illustrates how to use double metaphone functions to generate phonetic spellings in Dataprep by Trifacta.
Functions:
Item | Description |
---|---|
DOUBLEMETAPHONE Function | Returns a two-element array of primary and secondary phonetic encodings for an input string, based on the Double Metaphone algorithm. |
DOUBLEMETAPHONEEQUALS Function | Compares two input strings using the Double Metaphone algorithm. An optional threshold parameter can be modified to adjust the tolerance for matching. |
Source:
The following table contains some example strings to be compared.
string1 | string2 | notes |
---|---|---|
My String | my string | comparison is case-insensitive |
judge | juge | typo |
knock | nock | silent letters |
white | wite | missing letters |
record | record | two different words in English but match the same |
pair | pear | these match but are different words. |
bookkeeper | book keeper | spaces cause failures in comparison |
test1 | test123 | digits are not compared |
the end. | the end…. | punctuation differences do not matter. |
a elephant | an elephant | a and an are treated differently. |
Transformation:
You can use the DOUBLEMETAPHONE
function to generate phonetic spellings, as in the following:
Transformation Name |
|
---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | DOUBLEMETAPHONE(string1) |
Parameter: New column name | 'dblmeta_s1' |
You can compare string1
and string2
using the DOUBLEMETAPHONEEQUALS
function:
Transformation Name |
|
---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | DOUBLEMETAPHONEEQUALS(string1, string2, 'normal') |
Parameter: New column name | 'compare' |
Results:
The following table contains some example strings to be compared.
string1 | dblmeta_s1 | string2 | compare | Notes |
---|---|---|---|---|
My String | ["MSTRNK","MSTRNK"] | my string | TRUE | comparison is case-insensitive |
judge | ["JJ","AJ"] | juge | TRUE | typo |
knock | ["NK","NK"] | nock | TRUE | silent letters |
white | ["AT","AT"] | wite | TRUE | missing letters |
record | ["RKRT","RKRT"] | record | TRUE | two different words in English but match the same |
pair | ["PR","PR"] | pear | TRUE | these match but are different words. |
bookkeeper | ["PKPR","PKPR"] | book keeper | FALSE | spaces cause failures in comparison |
test1 | ["TST","TST"] | test123 | TRUE | digits are not compared |
the end. | ["0NT","TNT"] | the end…. | TRUE | punctuation differences do not matter. |
a elephant | ["ALFNT","ALFNT"] | an elephant | FALSE | a and an are treated differently. |