Check out this content on our new Help Site here.

Transformation Tool

Use the Transformation tool to perform these data-prep tasks:

You can perform each of those tasks in any order. The order depends on how you want to prep the data.

You have to place Transformation tools between an Assisted Modeling tool and a Classification or Regression tool.

Set Data Types

  1. Select Set Data Types from the dropdown in the Transformer section.
  2. In the Parameters section, the names of features are listed in the Feature column. You can select what data type a feature should be from the dropdown in the Set Data Types column. Current options are Numeric, Categorical, Boolean, and ID.

Clean Up Missing Values

  1. Select Clean Up Missing Values from the dropdown in the Transformer section.
  2. Check the boxes next to features with missing values you want to clean up.
  3. Select a method you want to use to clean up the missing values:
    • Replace with Mean: If you select this option, Assisted Modeling replaces missing values with the sum of all the rows of a feature divided by the total number of rows. Only use this method for numeric data. We recommend this option if your data is normally distributed and has no outliers.
    • Replace with Median: If you select this option, Assisted Modeling replaces missing values with the number that represents the midpoint in the distribution of your feature. We recommend this option if your data is skewed or contains outliers.
    • Replace with Mode: If you select this option, Assisted Modeling replaces missing values with the number that occurs most often. We recommend this option if a feature contains categorical values and you don't want to drop it. You can also use the mode for filling in missing numeric values.
    • Replace with Constant: If you select this option, Assisted Modeling reads empty fields as missing values. Select this option if you think the modeling algorithm could find meaning in the missing values themselves, because sometimes it can find patterns in the absence of data. You can also select this option if you think other methods of handling missing data could bias your model.

     

The tool won't clean up missing values for features with unchecked boxes. If a feature contains missing values and you don't choose a clean-up method, an error occurs downstream in the machine-learning pipeline.

Select Features

  1. Select Select Features from the dropdown in the Transformer section.
  2. If you don't want to include a feature in the model, uncheck the box next to its name.

One Hot Encoding

  1. Select One Hot Encoding from the dropdown in the Transformer section.
  2. Use the toggle to Hide Un-encodable Features.
  3. Check the box next to features you want to encode in machine-readable format.
  4. Use the dropdown to select how you want the tool to treat the encoded features:
    • Ignore allows the Transformer tool to score the data, treating unknown values as constants.
    • Error tells the Transformer tool to return an error if it encounters unknown values.