Missing Value Imputation Transformer
Missing value imputation: Use the missing value imputation transformation tool to clean up (impute) missing values. The tool recommends a clean-up method when there are missing values in your dataset. You can choose your own clean-up method and override any recommendations.
Before using the tool
Start with an existing workflow. You should first clean and prep your dataset. Once your dataset contains only the relevant data you need for your business use case, then start building a pipeline using the Machine Learning tools.
Add the tool
- Click the Transformation tool in the Machine Learning tool palette. Drag it to the workflow canvas, and connect it to your workflow.A start pipeline tool is required for the transformation to function. Your workflow should contain a start pipeline tool such as the Start Pipeline tool or the Assisted Modeling tool prior to starting a data transformation.
- In Transformer, select the transformation type you want to configure.
- Configure the tool.
Configure the tool
Configure the parameters. Understand the parameters before changing them. For best practices, avoid making assumptions, and use a test dataset to assess the performance of your model whether your objective is prediction or not.
The tool suggests imputation methods for each column in the dataset where there are rows with missing or null values. This is known as imputation.
For more information about how datasets with missing values are incompatible with Scikit-learn, visit Imputation of missing values.
1. Review each column with null values
2. Select an imputation method
Accept the recommendation or select a different clean-up method from the drop-down list. Clean-up methods include the following:
- Drop column - Alteryx will drop the column.
- Impute with mean -Alteryx will replace missing values with the mean value of all values in the column. The mean is calculated as the sum of all values in the column divided by the number of rows in the column.
- Impute with median - Alteryx will replace missing values with the median value of all values of the column. The median is calculated as the value that is halfway into the column if the values in the column were arranged from smallest to largest. If column contains an even number of rows, the median is calculated as the average or mean of the values in the two middle rows.
- Impute with mode -Alteryx will replace missing values with the mode. The mode is calculated as the value that occurs most often. If no values repeat, then there is no mode.
Run the workflow to apply the configuration.
Machine Learning Tools
Definitions for Machine Learning Tools
Steps in Assisted Modeling
Select Target and Machine-Learning Method
Other Machine Learning Tools
One Hot Encoding Machine Learning Tool
Fit Tool Machine Learning Tool