SUBSTITUTE Function
Replaces found string literal or pattern or column with a string, column, or function returning strings.
Input can be specified as a column reference, a function returning a string, or a string literal, although string literal usage is rare.
A column reference can refer to a column of String type.
If no match is found, the function returns the source string.
If multiple matches are found in a single string, all replacements are made.
Wrangle vs. SQL: This function is part of Wrangle, a proprietary data transformation language. Wrangle is not SQL. For more information, see Wrangle Language.
Basic Usage
Column reference example:
substitute(myURL,`{ip-address}`,myDomain)
Output: Searches the myURL column values for sub-strings that match valid IP addresses. Where matches are found, they are replaced with the corresponding value in the myDomain column.
Function reference example:
substitute(upper(companyName),'ACME','New ACME')
Output: Searches the uppercase version of values from the companyName column for the string literal ACME. When found, these matches are replaced by New ACME in the companyName column.
Syntax and Arguments
substitute(string_source,string_pattern,string_replacement[,ignore_case, pattern_before, pattern_after])
Argument | Required? | Data Type | Description |
|---|---|---|---|
string_source | Y | string | Name of the column, a function returning a string, or string literal to be applied to the function |
string_pattern | Y | string | String literal or pattern or a column or a function returning strings to find |
string_replacement | Y | string | String literal, column or function returning a string to use as replacement |
ignore_case | N | string | When |
pattern_before | N | string | String literal or pattern to find before finding the string_pattern value. |
pattern_after | N | string | String literal or pattern to find after finding the string_pattern value. |
For more information on syntax standards, see Language Documentation Syntax Notes.
string_source
Name of the item to be searched. Valid values can be:
String literals must be quoted (
'Hello, World').Column reference to any type that can be inferred as a string, which encompasses all values
Functions that return string values
Multiple values and wildcards are not supported.
Usage Notes:
Required? | Data Type | Example Value |
|---|---|---|
Yes | String literal or column reference (String, Array, or Object) | myColumn |
string_pattern
String to find. This value can be a string literal, a Wrangle , a regular expression, a column, or a function returning a String value.
String literals must be quoted (
'Hello, World').Multiple values and wildcards are not supported.
Usage Notes:
Required? | Data Type | Example Value |
|---|---|---|
Yes | String value or pattern or column reference (String) | 'Hello' |
string_replacement
Value with which to replacement any matched patterns. Value can be a string, a function returning string values, or a column reference containing strings.
String literals must be quoted (
'Hello, World').Column reference to any type that can be inferred as a string, which encompasses all values.
Multiple values and wildcards are not supported.
Usage Notes:
Required? | Data Type | Example Value |
|---|---|---|
Yes | String literal, column reference (String, Array, or Object), or function returning String value | '##REDACTED##' |
ignore_case
When true, matches are case-insensitive. Default is false.
Note
This argument is not required. By default, matches are case-sensitive.
Usage Notes:
Required? | Data Type | Example Value |
|---|---|---|
No | String value | 'false' |
pattern_before
String literal or pattern to find in a position before the pattern to match.
Note
This argument is not permitted when string_ pattern or string_replacement is of column data type.
Tip
Use this argument if there are potentially multiple instances of the pattern to match in the source.
Usage Notes:
Required? | Data Type | Example Value |
|---|---|---|
No | String literal or pattern | `{digit}{3}` |
pattern_after
String literal or pattern to find in a position after the pattern to match.
Note
This argument is not permitted when string_ pattern or string_replacement is of column data type.
Tip
Use this argument if there are potentially multiple instances of the pattern to match in the source.
Usage Notes:
Required? | Data Type | Example Value |
|---|---|---|
No | String literal or pattern | ' ' |
Examples
Tip
For additional examples, see Common Tasks.
Example - Partial obfuscation of credit card numbers
Source:
Suppose you have the following transactional data, which contains customer credit card numbers.
TransactionId | CreditCardNum | AmtDollars |
|---|---|---|
T001 | 4111-1111-1111-1111 | 100.29 |
T002 | 5500-0000-0000-0004 | 510.21 |
T003 | 3400-0000-0000-009 | 162.13 |
T004 | 3000-0000-0000-04 | 294.12 |
For security purposes, you wish to redact the first three sets of digits, so only the last set of digits appears.
Transformation:
To make the substitution, you must first change the type of the column to be a string:
Transformation Name |
|
|---|---|
Parameter: Columns | CreditCardNum |
Parameter: New type | 'String' |
You can then use the following transformation to perform the pattern-based replacement of four-digit sets that end in a dash with XXXX:
Transformation Name |
|
|---|---|
Parameter: Columns | CreditCardNum |
Parameter: Formula | substitute(CreditCardNum, `{digit}+\-`, 'XXXX-') |
To indicate that the column no longer contains valid information, you might choose to rename it like in the following:
Transformation Name |
|
|---|---|
Parameter: Option | Manual rename |
Parameter: Column | CreditCardNum |
Parameter: New column name | 'CreditCardNumOBSCURED' |
Results:
TransactionId | CreditCardNumOBSCURED | AmtDollars |
|---|---|---|
T001 | XXXX-XXXX-XXXX-1111 | 100.29 |
T002 | XXXX-XXXX-XXXX-0004 | 510.21 |
T003 | XXXX-XXXX-XXXX-009 | 162.13 |
T004 | XXXX-XXXX-XXXX-04 | 294.12 |