Skip to main content

Data Quality Rules Reference

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

This section contains reference information on the data quality rule types and input types that are available in Dataprep by Trifacta.

  • Data quality rules can be applied to your dataset through the Transformer page.

  • Input types identify the calculated metric types that can be used as inputs for a data quality rule.

Rule Types

name

description

Unique

Column values must be unique.

Implies

Source column values imply the values of a target column. For each unique source value, there should be exactly one implied target value.

Not Missing

Column values must not be missing. Null values and empty strings are not allowed.

Not Null

Column values must not be null. Empty strings are allowed.

Valid

Column values must be valid instances of a data type.

Match

Column values must match a pattern.

Not Match

Column values must not match a pattern.

Starts With

Column values must start with a pattern.

Ends With

Column values must end with a pattern.

Equal

Column values must equal a provided value.

Not Equal

Column values must not equal a provided value.

In Range

Column values must lie between provided minimum and maximum values.

Greater Than

Column values must be greater than a minimum value.

Less Than

Column values must be less than a maximum value.

In Set

Column values must be one of a set of acceptable values.

Not In Set

Column values must not be one of a set of unacceptable values.

Formula

Apply a custom data quality rule formula.

Metric Input Types

The following metric input types can be selected as the source of a data quality rule.

Note

These input types are available for selection from the Column drop-down.

Metric input types are supported for the following rules:

  • In Range

  • Greater Than

  • Less Than

  • Equals

  • Not Equals

  • In Set

  • Not In Set

name

description

Average

The average column value.

Count Distinct

The number of unique column values.

Maximum

The maximum column value.

Minimum

The minimum column value.

Sum

The sum of column values.

Standard Deviation

The sample standard deviation of column values.

Variance

The sample variance of column values.

Count

The number of rows.

Correlation

The Pearson correlation coefficient between two columns.

Z-Score

The distance from the mean, in units of standard deviations.