Recipe Development Terms
Terminology applicable to Designer Cloud.
Note
This list is not comprehensive.
These terms pertain to building recipes in Wrangle in the Transformer page.
argument
An input to a function.
binning
Several functions can be used to group values in a column into bins, which can assist in preparing your data for downstream use.
data type
A data type is the set of constraints on expected values in a column. When you specify the data type for a column, you provide a means for the platform to identify the values in the column that do not match the selected type, which assists in wrangling the mismatched values.
Data types can be selected from the column menus.
dependency
An input to a recipe that is not the primary datasource for the recipe. For example, if your recipe includes a join step, the dataset that is joined into your recipe is an upstream dependency. Recipe steps and changes outside of the Trifacta Application can create dependency errors, in which an upstream object can no longer be found and the reference to it cannot be resolved. These issues must be fixed prior to successful execution of a job.
file encoding
A file's encoding defines the set of characters that are in use in the file. There are many different encoding systems in use around the world. To represent English language, which uses a 26-character alphabet, UTF-8 is sufficient. However, to represent Asian character sets, which may contain thousands of characters, a different and broader set of characters is required.
When a file is imported, Designer Cloud assumes that the file is in the default encoding type. As needed, you can change the encoding type that is used to import the file.
full scan
A full scan sample is generated across the entire dataset on default running environment. Full scan samples are more representative of the total dataset. However, they can take a while to generate.
function
A function in Wrangle is an action that is applied to a set of values as part of a transformation step. A function can take 0 or more parameters as inputs, yielding a single output of a specific data type.
initial structure
When a file-based dataset is imported, Designer Cloud attempts to detect the format and structure of the data and then to apply a set of initial parsing steps to transform the data for display in tabular form in the data grid. These steps may vary depending on the file format.
These steps do not appear in the recipe. As needed, you can disable the detection of structure on import. When disabled, these steps are added as the first steps of the recipe, where you can edit or remove them as needed.
join
This database concept can be applied to datasets. In a join, two datasets are merged into one, based on a set of key columns. Values in these columns that match across the datasets are used to determined the values from each dataset to include in the joined dataset.
Joins are created as steps in your recipe.
lookup
A retrieval of a row of values from another dataset based on common values in columns in each dataset. A lookup is useful for bringing in reference information based on values in one of the columns of your dataset.
mismatched
Values in a column that do not conform to range or format of expected values for the column's data type.
missing
Cell values in the dataset that are empty.
multi-dataset operation
A multi-dataset (MDS) operation refers to any step in your recipe that uses two or more datasets. Joins and unions are examples of multi-dataset operations.
nested expression
An expression that is inside another expression. Example:
POWER(ABS(colA),colB)
Designer Cloud supports the use of nested expressions in your recipe steps.
null
A value that does not exist in the dataset.
operator
A single character that represents an arithmetic function or comparison. For example, the Plus sign (+
) represents the add function.
Operator Category | Description |
---|---|
Logical Operators | and, or, and not operators |
Numeric Operators | Add, subtract, multiply, and divide |
Comparison Operators | Compare two values with greater than, equals, not equals, and less than operators |
Ternary Operators | Use ternary operators to create if/then/else logic in your transforms. |
outliers
In statistics, an outlier refers to a value that is unusually above or below from the mean. In Designer Cloud, an outlier is 4 standard deviations away from the mean.
You can review outliers for column values.
parameter (language)
An input to a transform in Wrangle.
pattern
In Designer Cloud, a pattern is an object that describes a sub-string within a value. Patterns can be described using regular expressions, a common standard, or Wrangle , a proprietary simplification of regular expressions.
Patterns are widely used in the product for identifying and extract values from data, data type validation, and supporting pattern-based suggestions.
See regular expression.
plan metadata reference
A plan metadata reference is a programmatic reference to some aspect of a plan, its tasks, or results of the execution. These metadata references can be inserted into the requests and responses of tasks in the plan for delivery to other systems.
quick scan
A quick scan sample is generated using an appropriate selection of rows from the dataset. Since these samples are generated in Trifacta Photon, they are faster to produce.
range join
A range join is a type of join in which key values may be matched with a range of values in the joined-in dataset. For example, you can create a range join based on the source key value being greater than values in the key column of the joined-in dataset. A range join can explode the size of your resulting dataset.
Joins are created as steps in your recipe.
regular expression
Regular expressions are a powerful yet complex method of describing patterns of values for matching purposes.
source row number
The row number for a record as it appeared in the original dataset. Source row number information can be obtained by function. This function may return a null value if multi-dataset operations, such as union and join, have been performed on the dataset.
source metadata reference
A source metadata reference is a programmatic reference to some aspect of the source file for your dataset. Using these programmatic references, you can write source information for your original datasource into your dataset for future reference.
standardize
Designer Cloud provides multiple mechanisms to standardize column values using patterns, clustering algorithms, or functions.
string collation
String collation refers to a method of comparison of strings based on a set of rules. Designer Cloud includes the following functions to perform string collation-based comparisons.
transformation
A transformation is the unit of action in a recipe step. A transformation applies one or more actions on a set of rows or columns. Transformations are specified in the Transformer page through the Transform Builder.
transform
A transform in Wrangle is an action that is applied to rows or columns of your dataset. A transform can take zero or more parameters as inputs. A parameter may contain a reference to a column, a literal value, or a function.
Note
Transforms are not available through the Trifacta Application. Instead, you build transformations, which are more complex steps that reference transforms from the underlying language.
Alteryx pattern
A simplification of regular expressions, Wrangle are custom selectors for patterns in your data and provide a simpler and more readable alternative to regular expressions.
union
A union combines two or more datasets such that the rows of the second and later datasets are appended to the end of the first dataset. In a union operation, the columns must be matched up, or the results are a ragged dataset.
Unions are created as steps in your recipe.
wrangling
An informal term for the process of data preparation. Data wrangling was invented by the co-founders of Alteryx.