Skip to main content

Green hexagon containing a left parenthesis, period, asterisk and right parenthesis. RegEx Tool

One Tool Example

RegEx has a One Tool Example. Go to Sample Workflows to learn how to access this and many other examples directly in Alteryx Designer.

Use RegEx (Regular Expression) to leverage regular expression syntax to parse, match, or replace data.

RegEx Support

While regular expressions are supported in Designer, users are responsible for their own expressions and how the expressions impact their data.

For more resources on how to write regular expressions, go to https://regex101.com/, a site unaffiliated with Alteryx, or the RegEx Coach, an unaffiliated graphical application for Windows which you can use to experiment with (Perl-compatible) regular expressions interactively.

For additional information about Boost RegEx, refer to these resources:

Configure the Tool

  1. Select the Column to Parse.

  2. In Format to Convert...

    • Enter your Regular Expression. Use the "+" button to access common regular expressions that you may need while creating your expression. Find more information on the Perl Regular Expression Syntax website.

    • Case Insensitive: Selected by default. Searches do not distinguish between uppercase and lowercase letters.

  3. In Output, select the Output Method to use when parsing. Then configure the related Properties.

    • Replace: Replace the expression you searched for with a second expression.

      • Replacement Text: Enter an expression to replace your original regular expression by identifying the Marked Group to replace the expression with. Use the "+" button to access common regular expressions that you may need while creating your expression.

      • Copy Unmatched Text to Output

    • Tokenize: Split the incoming data using a regular expression. This option works similarly to the Text To Columns tool, except instead of matching and removing what you do not want, you match for what you want to keep. You want to match to the whole token, and if you have a marked group, only that part is returned. Go to Tokenize Method Examples below.

      • Split to Columns: Split a single column of data at each instance of the specified delimiter into multiple columns.

        • Number of Columns: Set how many columns are created.

        • Extra Columns: Select the behavior that is applied to extra columns.

          • Drop Extra with Warning: Data that extends past the split is dropped and a warning is generated indicating that there was excess information.

          • Drop Extra without Warning: Data that extends past the split is dropped and no warning is generated.

          • Error: Data that extends past the split causes an error and the workflow stops processing.

        • Output Root Name: Enter the name that the newly generated columns should be based on. The new columns are named as the root name with a serially increasing integer appended.

      • Split to Rows: Split a single column of data at each instance of the specified delimiter into multiple rows. Use a key column in your record so you don't lose track of which value came from which row.

    • Parse: Separate the expression into new columns, and set the Name, Type, and Size of the new columns. A new column is created in the Output Columns table with these columns:

      • Name: Select the column name to enter a new name.

      • Type: Use the dropdown to select the new data type.

      • Size: Select data size to enter a new size.

      • Expression: Populated automatically.

    • Match: Append a column containing a number: 1 if the expression matched, 0 if it did not.

      • Column name for match status: Provide a name for the appended column.

      • Error if not Matched: Select to generate an error if the expression and string do not match to end the workflow processing.

Tokenize Method Examples

These use cases rely on the Regex tool's Tokenize method.

  • Parse a 9-character string 123456789 into 3 fields. The regular expression is ....

  • Parse a 9-character string into 3 fields, returning only the second character. The regular expression is .(.)..

  • Parse a field with the delimiter Ctrl-A. The regular expression for tokenizing a Ctrl-A delimited string is [^\cA]+.

    • [^...] The brackets specify a match to a single character in a set of characters. Starting the set with ^ changes it to match any character not in the set.

    • \cA This simply matches the Ctrl-A character.

    • + This means match 1 or more of the previous.

  • Allow blank tokens to preserve entries: abc, ,def. The regular expression is ([^,]*) (?:,|$).

    • (...) Parenthesis create a marked group of the expression. The tokenize mode allows you to match a larger part of the input field, yet only return a subset that was marked. This way you avoid returning the delimiter. You may only have 1 marked expression.

    • [^,] Starting the set with ^ changes it to match any character not in the set, in this case a ,.

    • * Match 0 or more of the previous, which allows for an empty set. You cannot end here because the regex engine doesn't like a match of 0 characters since there is an infinite number of matches, so we have to terminate the match on something.

    • (?:....) This is an unmarked group. We need this for the or which we use a | for.

    • | This is saying match either the thing before or after, but not both. This almost always needs to be used with a marked or unmarked group.

    • $ Matches the end of the string. Hence (?:,|$) matches up to a , or the end of the string.

  • Parse HTML links from a home page. The regular expression is <a .*?>.*?</a>. This pulls every link out of a large HTML document into a series of records.

    • <a This is a literal match for the text <a.

    • .*?. is any character, * is 0 or more. The ? modifies the * to make it match to the shortest possible match. Without this, the expression might find a single token from the beginning of the first link to the end of the last.

    • > This is a literal match for the text >.

    • .*? The shortest possible match of any characters that still satisfies the entire regex.

    • </a> This is a literal match for the text </a>. This ends the match.