Returns a two-element array of primary and secondary phonetic encodings for an input string, based on the Double Metaphone algorithm.
The Double Metaphone algorithm processes an input string to render a primary and secondary spelling for it. For English language words, the algorithm removes silent letters, normalizes combinations of characters to a single definition, and removes vowels, except from the beginnings of words. In this manner, the algorithm can normalize inconsistencies between spellings for better matching. For more information, see https://en.wikipedia.org/wiki/Metaphone.
Tip
This function is useful for performing fuzzy matching between string values, such as between potential join key values.
Source values can be string literals, column references, or expressions that evaluate to strings.
Wrangle vs. SQL: This function is part of Wrangle, a proprietary data transformation language. Wrangle is not SQL. For more information, see Wrangle Language.
String literal reference example:
doublemetaphone('My String')
Output: See below.
["MSTRNK","MSTRNK"]
Column reference example:
doublemetaphone(string1)
Output: Generates a new double_metaphone
column containing the evaluation of string1
column values through the Double Metaphone algorithm.
doublemetaphone(string_ref)
Argument | Required? | Data Type | Description |
---|---|---|---|
string_ref | Y | string | Name of column or string literal to apply to the function |
For more information on syntax standards, see Language Documentation Syntax Notes.
String literal, column reference, or expression whose elements you want to filter through the Double Metaphone algorithm.
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
Yes | String literal, column reference, or expression evaluating to a string | myString1 |
Tip
For additional examples, see Common Tasks.
This example illustrates how to use double metaphone functions to generate phonetic spellings in Dataprep by Trifacta.
Functions:
Item | Description |
---|---|
DOUBLEMETAPHONE Function | Returns a two-element array of primary and secondary phonetic encodings for an input string, based on the Double Metaphone algorithm. |
DOUBLEMETAPHONEEQUALS Function | Compares two input strings using the Double Metaphone algorithm. An optional threshold parameter can be modified to adjust the tolerance for matching. |
Source:
The following table contains some example strings to be compared.
string1 | string2 | notes |
---|---|---|
My String | my string | comparison is case-insensitive |
judge | juge | typo |
knock | nock | silent letters |
white | wite | missing letters |
record | record | two different words in English but match the same |
pair | pear | these match but are different words. |
bookkeeper | book keeper | spaces cause failures in comparison |
test1 | test123 | digits are not compared |
the end. | the end…. | punctuation differences do not matter. |
a elephant | an elephant | a and an are treated differently. |
Transformation:
You can use the DOUBLEMETAPHONE
function to generate phonetic spellings, as in the following:
Transformation Name |
|
---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | DOUBLEMETAPHONE(string1) |
Parameter: New column name | 'dblmeta_s1' |
You can compare string1
and string2
using the DOUBLEMETAPHONEEQUALS
function:
Transformation Name |
|
---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | DOUBLEMETAPHONEEQUALS(string1, string2, 'normal') |
Parameter: New column name | 'compare' |
Results:
The following table contains some example strings to be compared.
string1 | dblmeta_s1 | string2 | compare | Notes |
---|---|---|---|---|
My String | ["MSTRNK","MSTRNK"] | my string | TRUE | comparison is case-insensitive |
judge | ["JJ","AJ"] | juge | TRUE | typo |
knock | ["NK","NK"] | nock | TRUE | silent letters |
white | ["AT","AT"] | wite | TRUE | missing letters |
record | ["RKRT","RKRT"] | record | TRUE | two different words in English but match the same |
pair | ["PR","PR"] | pear | TRUE | these match but are different words. |
bookkeeper | ["PKPR","PKPR"] | book keeper | FALSE | spaces cause failures in comparison |
test1 | ["TST","TST"] | test123 | TRUE | digits are not compared |
the end. | ["0NT","TNT"] | the end…. | TRUE | punctuation differences do not matter. |
a elephant | ["ALFNT","ALFNT"] | an elephant | FALSE | a and an are treated differently. |