SUBSTITUTE Function
Replaces found string literal or pattern or column with a string, column, or function returning strings.
Input can be specified as a column reference, a function returning a string, or a string literal, although string literal usage is rare.
A column reference can refer to a column of String type.
If no match is found, the function returns the source string.
If multiple matches are found in a single string, all replacements are made.
Wrangle vs. SQL: This function is part of Wrangle, a proprietary data transformation language. Wrangle is not SQL. For more information, see Wrangle Language.
Basic Usage
Column reference example:
substitute(myURL,`{ip-address}`,myDomain)
Output: Searches the myURL
column values for sub-strings that match valid IP addresses. Where matches are found, they are replaced with the corresponding value in the myDomain
column.
Function reference example:
substitute(upper(companyName),'ACME','New ACME')
Output: Searches the uppercase version of values from the companyName
column for the string literal ACME
. When found, these matches are replaced by New ACME
in the companyName
column.
Syntax and Arguments
substitute(string_source,string_pattern,string_replacement[,ignore_case, pattern_before, pattern_after])
Argument | Required? | Data Type | Description |
---|---|---|---|
string_source | Y | string | Name of the column, a function returning a string, or string literal to be applied to the function |
string_pattern | Y | string | String literal or pattern or a column or a function returning strings to find |
string_replacement | Y | string | String literal, column or function returning a string to use as replacement |
ignore_case | N | string | When |
pattern_before | N | string | String literal or pattern to find before finding the string_pattern value. |
pattern_after | N | string | String literal or pattern to find after finding the string_pattern value. |
For more information on syntax standards, see Language Documentation Syntax Notes.
string_source
Name of the item to be searched. Valid values can be:
String literals must be quoted (
'Hello, World'
).Column reference to any type that can be inferred as a string, which encompasses all values
Functions that return string values
Multiple values and wildcards are not supported.
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
Yes | String literal or column reference (String, Array, or Object) | myColumn |
string_pattern
String to find. This value can be a string literal, a Wrangle , a regular expression, a column, or a function returning a String value.
String literals must be quoted (
'Hello, World'
).Multiple values and wildcards are not supported.
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
Yes | String value or pattern or column reference (String) | 'Hello' |
string_replacement
Value with which to replacement any matched patterns. Value can be a string, a function returning string values, or a column reference containing strings.
String literals must be quoted (
'Hello, World'
).Column reference to any type that can be inferred as a string, which encompasses all values.
Multiple values and wildcards are not supported.
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
Yes | String literal, column reference (String, Array, or Object), or function returning String value | '##REDACTED##' |
ignore_case
When true
, matches are case-insensitive. Default is false
.
Nota
This argument is not required. By default, matches are case-sensitive.
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
No | String value | 'false' |
pattern_before
String literal or pattern to find in a position before the pattern to match.
Nota
This argument is not permitted when string_ pattern
or string_replacement
is of column data type.
Suggerimento
Use this argument if there are potentially multiple instances of the pattern to match in the source.
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
No | String literal or pattern | `{digit}{3}` |
pattern_after
String literal or pattern to find in a position after the pattern to match.
Nota
This argument is not permitted when string_ pattern
or string_replacement
is of column data type.
Suggerimento
Use this argument if there are potentially multiple instances of the pattern to match in the source.
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
No | String literal or pattern | ' ' |
Examples
Suggerimento
For additional examples, see Common Tasks.
Example - Partial obfuscation of credit card numbers
Source:
Suppose you have the following transactional data, which contains customer credit card numbers.
TransactionId | CreditCardNum | AmtDollars |
---|---|---|
T001 | 4111-1111-1111-1111 | 100.29 |
T002 | 5500-0000-0000-0004 | 510.21 |
T003 | 3400-0000-0000-009 | 162.13 |
T004 | 3000-0000-0000-04 | 294.12 |
For security purposes, you wish to redact the first three sets of digits, so only the last set of digits appears.
Transformation:
To make the substitution, you must first change the type of the column to be a string:
Transformation Name | |
---|---|
Parameter: Columns | CreditCardNum |
Parameter: New type | 'String' |
You can then use the following transformation to perform the pattern-based replacement of four-digit sets that end in a dash with XXXX
:
Transformation Name | |
---|---|
Parameter: Columns | CreditCardNum |
Parameter: Formula | substitute(CreditCardNum, `{digit}+\-`, 'XXXX-') |
To indicate that the column no longer contains valid information, you might choose to rename it like in the following:
Transformation Name | |
---|---|
Parameter: Option | Manual rename |
Parameter: Column | CreditCardNum |
Parameter: New column name | 'CreditCardNumOBSCURED' |
Results:
TransactionId | CreditCardNumOBSCURED | AmtDollars |
---|---|---|
T001 | XXXX-XXXX-XXXX-1111 | 100.29 |
T002 | XXXX-XXXX-XXXX-0004 | 510.21 |
T003 | XXXX-XXXX-XXXX-009 | 162.13 |
T004 | XXXX-XXXX-XXXX-04 | 294.12 |