Returns true
if a value contains a string or pattern. The value to search can be a string literal, a function returning a string, or a reference to a column of String type. You can also search an array of columns.
Since the MATCHES
function returns a Boolean value, it can be used as both a function and as a conditional.
Suggerimento
When you select values in a histogram for a column of Array type, the function that identifies the values on which to perform a transform is typically MATCHES
.
Suggerimento
If you need the location of the matched string within the source, use the FIND
function. See FIND Function.
Wrangle vs. SQL: This function is part of Wrangle, a proprietary data transformation language. Wrangle is not SQL. For more information, see Wrangle Language.
Column reference example:
matches(ProdId, 'Fun Toy')
Output: Returns true
when the value in the ProdId
column value contains the string literal Fun Toy
.
matches([ProdId,ProdName], 'Fun Toy')
Output: Returns true
when the value in the ProdId
or ProdName
column values contain the string literal Fun Toy
.
String literal example:
matches('Hello, World', 'Hello')
Output: Returns true
.
matches(column_string,string_pattern <span>[,ignore_case]</span>)
Argument | Required? | Data Type | Description |
---|---|---|---|
column_string | Y | string | Name of column or string literal to be searched. This value can also be an array of column names to search, combined in a logical OR. |
string_pattern | Y | string | Name of column, function returning a string ,or string literal or pattern to find |
ignore_case | N | string | When |
For more information on syntax standards, see Language Documentation Syntax Notes.
Name of the column or string literal to be searched.
Missing string or column values generate missing string results.
String constants must be quoted (
'Hello, World'
).
Multiple columns can be specified as an array (
matches([Col1,Col2],'hello'
).
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
Yes | String | MyColumn |
Column of strings, function returning a string, or string literal. Value can be a column reference, string literal, Wrangle , or regular expression to match against the source column-string.
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
Yes | Column reference or string literal or pattern | 'home page' |
When true
, matches are case-insensitive. Default is false
.
Nota
This argument is not required. By default, matches are case-sensitive.
Required? | Data Type | Example Value |
---|---|---|
No | String literal evaluating to a Boolean | 'true' |
Suggerimento
For additional examples, see Common Tasks.
In downloaded log files, you might see error messages of the following type:
INFO
- status information on the processWARNING
- system encountered a non-fatal error during executionERROR
- system encountered an error, which might have caused the job to fail.
For purposes of analysis, you might want to filter out the data for INFO
and WARNING
messages.
Source:
Here is example data from a log file of a failed job:
log |
---|
2016-01-29T00:14:24.924Z com.example.hadoopdata.monitor.spark_runner.ProfilerServiceClient [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - Spark Profiler URL - http://localhost:4006/ |
2016-01-29T00:14:40.066Z com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - Spark process ID was null. |
2016-01-29T00:14:40.067Z com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - --------------------------------END SPARK JOB------------------------------- |
2016-01-29T00:14:44.961Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] ERROR com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Job '128' threw an exception during execution |
2016-01-29T00:14:44.962Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] INFO com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Making sure async worker is stopped |
2016-01-29T00:14:44.962Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] INFO com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Notifying monitor for job '128', code 'FAILURE' |
2016-01-29T00:14:44.988Z com.example.hadoopdata.monitor.client.MonitorClient [pool-4-thread-2] INFO com.example.hadoopdata.monitor.client.MonitorClient - Request succeeded to monitor ip-0-0-0-0.example.com:8001 |
Transformation:
When the above data is loaded into the application, you might want to break up the data into separate columns, which splits them on the Z
character at the end of the timestamp:
Transformation Name | |
---|---|
Parameter: Column | column1 |
Parameter: Option | On pattern |
Parameter: Match pattern | `Z ` |
Then, you can rename the two columns: Timestamp
and Log_Message
. To filter out the INFO
and WARNING
messages, you can use the following transforms, which match on the string literals to identify these messages:
Transformation Name | |
---|---|
Parameter: Condition | Custom formula |
Parameter: Type of formula | Custom single |
Parameter: Condition | matches(Log_Message, '] INFO ') |
Parameter: Action | Delete matching rows |
Transformation Name | |
---|---|
Parameter: Condition | Custom formula |
Parameter: Type of formula | Custom single |
Parameter: Condition | matches(Log_Message, '] WARNING ') |
Parameter: Action | Delete matching rows |
Results:
After the above steps, the data should look like the following:
Timestamp | Log_Message |
---|---|
2016-01-29T00:14:44.961 | com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] ERROR com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Job '128' threw an exception during execution |