Supported Special Regular Expression Characters
Dataprep by Trifacta supports a set of special characters for regular expressions that are common to all of the execution engines supported by the platform.
Slashes
The forward slash character is used to denote the boundaries of the regular expression:
/this_is_my_regular_expression/
The backslash character (
\
) is the escaping character. It can be used to denote an escaped character, a string, literal, or one of the set of supported special characters.Use a double backslash (
\\
) to denote an escaped string literal. For more information, see Escaping Strings in Transformations.
Supported Special RegEx Characters
The table below identifies the special characters that are supported in the platform.
Special Characters | Description |
---|---|
\\ | String literal match for |
| Matches any zero-width word boundary, such as between a letter and a space. Example: |
\B | Matches any zero-width non-word boundary, such as between two letters or two spaces. Example: |
| Matches a control character ( |
\d | Matches any digit. |
\D | Matches any non-digit. |
\f | Matches a form feed. |
\n | Matches a line feed. Note These characters are not supported in inputs for Object and Array data types. |
\r | Matches a carriage return. |
\s | Matches any whitespace character. These characters include:
|
\S | Matches any character that is not one of the supported whitespace characters. |
\t | Matches a horizontal tab. Note These characters are not supported in inputs for Object and Array data types. |
\v | Matches a vertical tab. |
\w | Matches any alphanumeric value, including the underscore. Tip Column names must match the same set of characters. |
\W | Matches any non-alphanumeric character, including the underscore. |
| Matches the ASCII character code as expressed by the hexadecimal value |
\uHHHH | Matches the Unicode character code as expressed by the hexadecimal value |
Required Escaped Characters
The following characters have special meaning within a regular expression.
. ^ $ * + - ? ( ) [ ] { } \ | — /
To reference the literal character, you must escape it within the regular expression, as in:
/\./