The Preparation category includes tools that prepare data for downstream analysis.
Auto Field Tool: The Auto Field tool reads through an input file and sets the field type to the smallest possible size relative to the data contained within the column.
Create Samples Tool: The Create Samples tool splits the input records into two or three random samples. Specify the percentage of records that are in the estimation and validation samples, and if the total is less than 100%, the remaining records fall in the holdout sample.
Data Cleansing Tool: The Data Cleansing tool fixes common data quality issues using a variety of parameters.
Filter Tool: The Filter tool queries records and splits the data into two outputs, True (where the data meets the specified criteria) and False (where it does not).
Formula Tool: The Formula tool creates or updates columns using one or more expressions to perform a broad variety of calculations or operations.
Generate Rows Tool: The Generate Rows tool creatse new rows of data at the record level. It is useful to create a sequence of numbers, transactions, or dates.
Imputation Tool: The Imputation tool updates specific values in a numeric data field with another selected value. It is useful for replacing NULL values.
Multi-Field Binning Tool: The Multi-Field Binning tool groups multiple numeric fields into tiles or bins, especially for use in predictive analysis.
Multi-Field Formula Tool: The Multi-Field Formula tool makes it easy to execute a single function on multiple fields.
Multi-Row Formula Tool: The Multi-Row Formula tool creates or updates a column using an expression that can reference columns in a subsequent or prior row. It is useful for parsing complex data, and creating running totals, averages, percentages, and other mathematical calculations.
Oversample Field Tool: The Oversample Field tool samples incoming data so that there is equal representation of data values so they can be used effectively in a predictive model.
Random % Sample Tool: The Random % Sample tool returns an expected number of records resulting in a random sample of the incoming data stream.
Record ID Tool: The Record ID tool creates a new column in the data and assigns a unique identifier, that increases sequentially, for each record in the data.
Sample Tool: The Sample tool extracts a specified portion of the records in the data stream.
Select Tool: The Select tool includes, excludes, and reorders the columns of data that pass through a workflow. With the Select tool you can also modify the type and size of data, rename a column, or add a description.
Select Records Tool: The Select Records tool selects specific records and/or ranges of records including discontinuous ranges. It is useful for troubleshooting and sampling.
Sort Tool: The Sort tool arranges the records in a table in alphanumeric order, based on the values of the specified data fields.
Tile Tool: The Tile tool assigns a value (tile) based on ranges in the data.
Unique Tool: The Unique tool distinguishes whether a data record is unique or a duplicate by grouping on one or more specified fields, then sorting on those fields. The first record in each group is sent to the Unique output stream while the remaining records are sent to the Duplicate output stream.