Skip to main content

Supported Special Regular Expression Characters

Dataprep by Trifacta supports a set of special characters for regular expressions that are common to all of the execution engines supported by the platform.

Slashes

The forward slash character is used to denote the boundaries of the regular expression:

/this_is_my_regular_expression/
  • The backslash character (\) is the escaping character. It can be used to denote an escaped character, a string, literal, or one of the set of supported special characters.

  • Use a double backslash (\\) to denote an escaped string literal. For more information, see Escaping Strings in Transformations.

Supported Special RegEx Characters

The table below identifies the special characters that are supported in the platform.

Special Characters

Description

\\

String literal match for\character.

\b

Matches any zero-width word boundary, such as between a letter and a space.

Example:/\bre/does not matchreintire, since re is not on the word boundary. /re\b/ does match.

\B

Matches any zero-width non-word boundary, such as between two letters or two spaces.

Example: /\Bre/ matches re in tire. It does not match in respect, since that instance of re is on a word boundary.

\cX

Matches a control character (CTRL + A-Z), where X is the corresponding letter in the alphabet.

\d

Matches any digit.

\D

Matches any non-digit.

\f

Matches a form feed.

\n

Matches a line feed.

Note

These characters are not supported in inputs for Object and Array data types.

\r

Matches a carriage return.

\s

Matches any whitespace character. These characters include:

  • space

  • tab

  • form feed

  • line feed

  • Other Unicode space characters

\S

Matches any character that is not one of the supported whitespace characters.

\t

Matches a horizontal tab.

Note

These characters are not supported in inputs for Object and Array data types.

\v

Matches a vertical tab.

\w

Matches any alphanumeric value, including the underscore.

Tip

Column names must match the same set of characters.

\W

Matches any non-alphanumeric character, including the underscore.

\xHH

Matches the ASCII character code as expressed by the hexadecimal value HH.

\uHHHH

Matches the Unicode character code as expressed by the hexadecimal valueHHHH.

Required Escaped Characters

The following characters have special meaning within a regular expression.

. ^ $ * + - ? ( ) [ ] { } \ | — /

To reference the literal character, you must escape it within the regular expression, as in:

/\./