Countpattern Transform
Nota
Transforms are a part of the underlying language, which is not directly accessible to users. This content is maintained for reference purposes only. For more information on the user-accessible equivalent to transforms, see Transformation Reference.
Counts the number of instances of a specified pattern in a column and writes that value into a newly generated column. Source column is unchanged.
Basic Usage
countpattern col: myCol on: 'honda'
Output: Generates a new column containing the number of instances of the string honda
that appear in each row of the column, myCol
.
Syntax and Parameters
countpattern col:column_ref [ignoreCase:true|false] [after:start_point | from: start_point] [before:end_point | to:end_point] [on:'exact_match']
Token | Required? | Data Type | Description |
---|---|---|---|
countpattern | Y | transform | Name of the transform |
col | Y | string | Source column name |
ignoreCase | N | boolean | If |
Matching parameters:
Nota
At least one of the following parameters must be included to specify the pattern to count: after
, before
, from
, on
, to
.
Token | Required? | Data Type | Description |
---|---|---|---|
after | N | string | String literal or pattern that precedes the pattern to match |
before | N | string | String literal or pattern that appears after the pattern to match |
from | N | string | String literal or pattern that identifies the start of the pattern to match |
on | N | string | String literal or pattern that identifies the pattern to match. |
to | N | string | String literal or pattern that identifies the end of the pattern to match |
For more information on syntax standards, see Language Documentation Syntax Notes.
col
Identifies the column to which to apply the transform. You can specify only one column.
countpattern col: MyCol on: 'MyString'
Output: Counts the number of instances of the value MyString
in the MyCol
column and writes this value to a new column.
Usage Notes:
Required? | Data Type |
---|---|
Yes | String (column name) |
after
countpattern col: MyCol after: 'entry:'
Output: Counts 1
if there is anything that appears in the MyCol
column value after the string entry:
. If the value entry:
does not appear in the column, the output value is 0
.
A pattern identifier that precedes the value or pattern to match. Define the after
parameter value using string literals, regular expressions, or Wrangle .
Usage Notes:
Required? | Data Type |
---|---|
No | String (string literal or pattern) |
The
after
andfrom
parameters are very similar.from
includes the matching value as part of the extracted string.after
can be used with eitherto
,on
, orbefore
. See Pattern Clause Position Matching
before
A pattern identifier that occurs after the value or pattern to match. Define the pattern using string literals, regular expressions, or Wrangle .
countpattern col: MyCol before: '|'
Output:
Counts
1
if there is a value that appears before the pipe character (|
) in theMyCol
column, and no other pattern parameter is specified. If thebefore
value does not appear in the column, the output value is0
.If another pattern parameter such as
after
is specified, the total count of instances is written to the new column.
Usage Notes:
Required? | Data Type |
---|---|
No | String or pattern |
The
before
andto
parameters are very similar.to
includes the matching value as part of the extracted string.before
can be used with eitherfrom,
on
, orafter
. See Pattern Clause Position Matching.
from
Identifies the pattern that marks the beginning of the value to match. It can be a string literal, Wrangle , or regular expression. The from
value is included in the match.
countpattern col: MyCol from: 'go:'
Output:
Counts
1
if contents fromMyCol
that occur fromgo:
, to the end of the cell when no other pattern parameter is specified. Ifgo:
does not appear in the column, the output value is blank.If another pattern parameter such as
to
is specified, the total count of instances is written to the new column.
Usage Notes:
Required? | Data Type |
---|---|
No | String or pattern |
The
after
andfrom
parameters are very similar.from
includes the matching value as part of the extracted string.from
can be used with eitherto
orbefore
. See Pattern Clause Position Matching.
on
Identifies the pattern to match, which can be a string literal, Wrangle , or regular expression.
countpattern col: MyCol on: `###ERROR`
Suggerimento
You can insert the Unicode equivalent character for this parameter value using a regular expression of the form /\uHHHH/
. For example, /\u0013/
represents Unicode character 0013
(carriage return). For more information, see Supported Special Regular Expression Characters.
Usage Notes:
Required? | Data Type |
---|---|
No | String (literal, regular expression, or Wrangle |
to
Identifies the pattern that marks the ending of the value to match. Pattern can be a string literal, Wrangle , or regular expression. The to
value is included in the match.
countpattern col:MyCol from:'note:' to: `/`
Output:
Counts instances from
MyCol
column of all values that begin withnote:
up to a backslash character.If a second pattern parameter is not specified, then this value is either
0
or1
.
Usage Notes:
Required? | Data Type |
---|---|
No | String or pattern |
The
before
andto
parameters are very similar.to
includes the matching value as part of the extracted string.to
can be used with eitherfrom
orafter
. See Pattern Clause Position Matching.
ignoreCase
Indicates whether the match should ignore case or not.
Set to
true
to ignore case matching.(Default) Set to
false
to perform case-sensitive matching.
countpattern col: MyCol on: 'My String' ignoreCase: true
Output: Counts the instances of the following values if they appear in the MyCol
column: My String
, my string
, My string
, etc.
Usage Notes:
Required? | Data Type |
---|---|
No | Boolean |
Examples
Suggerimento
For additional examples, see Common Tasks.
Example - counting patterns in tweets
This example demonstrates how to count the number of occurrences of text patterns in a column.
Functions:
Item | Description |
---|---|
IF Function | The |
Source:
The dataset below contains fictitious tweet information shortly after the release of an application called, "Myco ExampleApp".
Date | twitterId | isEmployee | tweet |
---|---|---|---|
11/5/15 | lawrencetlu38141 | FALSE | Just downloaded Myco ExampleApp! Transforming data in 5 mins! |
11/5/15 | petramktng024 | TRUE | Try Myco ExampleApp, our new free data wrangling app! See www.example.com. |
11/5/15 | joetri221 | TRUE | Proud to announce the release of Myco ExampleApp, the free version of our enterprise product. Check it out at www.example.com. |
11/5/15 | datadaemon994 | FALSE | Great start with Myco ExampleApp. Super easy to use, and actually fun. |
11/5/15 | 99redballoons99 | FALSE | Liking this new ExampleApp! Good job, guys! |
11/5/15 | bigdatadan7182 | FALSE | @support, how can I find example datasets for use with your product? |
There are two areas of analysis:
For non-employees, you want to know if they are mentioning the new product by name.
For employees, you want to know if they are including cross-references to the web site as part of their tweet.
Transformation:
The following counts the occurrences of the string ExampleApp
in the tweet
column. Note the use of the ignoreCase
parameter to capture capitalization differences:
Transformation Name | |
---|---|
Parameter: Column | tweet |
Parameter: Option | Text or pattern |
Parameter: Text or pattern to count | 'ExampleApp' |
Parameter: Ignore case | true |
For non-employees, you want to track if they have mentioned the product in their tweet:
Transformation Name | |
---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | if(isEmployee=='FALSE' && countpattern_tweet=='1',true,false) |
Parameter: New column name | 'nonEmployeeExampleAppMentions' |
The following counts the occurrences of example.com
in their tweets:
Transformation Name | |
---|---|
Parameter: Column | tweet |
Parameter: Option | Text or pattern |
Parameter: Text or pattern to count | 'example.com' |
Parameter: Ignore case | true |
For employees, you want to track if they included the above cross-reference in their tweets:
Transformation Name | |
---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | if(isEmployee=='TRUE' && countpattern_tweet1 == 1, true, false) |
Parameter: New column name | 'employeeWebsiteCrossRefs' |
Results:
After you delete the two columns tabulating the counts, you end up with the following:
Date | twitterId | isEmployee | tweet | employeeWebsiteCrossRefs | nonEmployeeExampleAppMentions |
---|---|---|---|---|---|
11/5/15 | lawrencetlu38141 | FALSE | Just downloaded Myco ExampleApp! Transforming data in 5 mins! | false | true |
11/5/15 | petramktng024 | TRUE | Try Myco ExampleApp, our new free data wrangling app! See www.example.com. | true | false |
11/5/15 | joetri221 | TRUE | Proud to announce the release of Myco ExampleApp, the free version of our enterprise product. Check it out at www.example.com. | true | false |
11/5/15 | datadaemon994 | FALSE | Great start with Myco ExampleApp. Super easy to use, and actually fun. | false | true |
11/5/15 | 99redballoons99 | FALSE | Liking this new ExampleApp! Good job, guys! | false | true |
11/5/15 | bigdatadan7182 | FALSE | @support, how can I find example datasets for use with your product? | false | false |