Use the Assisted option to get help building machine-learning models. It guides you through a step-by-step process, which includes selecting a target and machine-learning method, setting data types, cleaning up missing values, choosing features, and selecting the best algorithm. If you're unsure what algorithm is best, Assisted Modeling allows you to compare algorithms in the leaderboard. After that, you can add a pipeline to the Designer canvas that contains all the Machine Learning tools you've used to train the model.
Before you can use Assisted Modeling, you have to use the Input Data tool to bring your data into Designer, then connect it to the Assisted Modeling tool. After you select Run, you can select Start Assisted Modeling in the configuration window.
1. Select Target and Machine-Learning Method
Select a target, and let Assisted Modeling pick the machine-learning method you want to use to predict the target.
- In the Available Targets section, the names of features in the dataset are listed. Select the feature you want to set as the target.
- Assisted Modeling automatically detects whether the target contains categorical or numeric data, and it selects the appropriate machine-learning method.
- Select Next to go to Step 2: Set Data Types.
Assisted Modeling makes sure you've picked the correct target before you go to the next step. After that, you can't change the target without restarting the whole process. Select Continue if you've picked the correct target.
2. Select Automation Level
Select whether you want the tool to build the machine-learning pipeline for you or Assisted Modeling to walk you through the step-by-step process.
Select the option Step-by-Step or Automatic.
If you choose Step-by-Step, Assisted Modeling proceeds to Step 3: Set Data Types.
If you choose Automatic, Assisted Modeling automatically walks through the steps to build the machine-learning pipeline: it sets data types, cleans up missing values, selects features, and selects algorithms. When the tool finishes that process, you can see the output in the leaderboard.
3. Set Data Types
Assisted Modeling sets the data type for each feature. It displays a recommended data type in the Data Type column. The recommended option is labeled (for example, Numeric (Recommended)).
- Select a feature to view info about it in the Column Details section. There, you can see Data Type Probabilities, which displays how confident Assisted Modeling is that a feature is a certain data type. You can also see a Preview, which contains a sample of the data. Use this info to make sure data types are set correctly.
- If a feature is the wrong data type, use the dropdown in the Data Type column to select the correct data type.
- Select Next to go to Step 3: Clean Up Missing Values.
If you're unsure what a term means, check the Glossary section in Assisted Modeling. The section contains helpful info about many of the common terms used by data scientists.
4. Clean Up Missing Values
Assisted Modeling cleans up the missing values in the data. For each feature that contains missing values, it displays a recommended method for cleaning up missing values in the Method column. The recommended option is labeled (for example, Replace with Median (Recommended)).
- Select a feature to view info about it in the Column Details section. There, you can see the Clean-Up Method, which explains how Assisted Modeling picks the method to clean up the missing data. You can also see a Preview, which contains a sample of the data. Use this info to make sure you're using the correct method to handle missing values.
- If you want to use a different clean-up method, use the dropdown in the Method column to select the correct clean-up method.
- Select Next to go to Step 4: Select Features.
5. Select Features
Assisted Modeling selects what features result in the best model. For each feature, it evaluates whether it is a good predictor in the Feature Info column.
- Select a feature to view info about it in the Column Details section. There, you can see Predictor Details, which provides two measures of how well the feature performs: Gini and GKT. Assisted Modeling uses both measures to determine whether the feature associates too much or little with the target. You can also see a Preview, which contains a sample of the data. Use this info to make sure features are good predictors.
- If you don't want to use a feature, uncheck the box next to the name of that feature.
- Select Next to go to Step 5: Select Algorithms.
6. Select Algorithms
Assisted Modeling provides you with a selection of algorithms to choose from. It recommends different algorithms depending on what kind of problem you want to solve. In this step, you select what algorithms you want to evaluate in the leaderboard.
- The card for each algorithm displays its pros and cons, a description, and some use cases. Use this info to make sure you want to evaluate the algorithm.
For categorical variables, we have 4 algorithms available:
- Logistic Regression
- Decision Tree
- Random Forest
For continuous variables (numerical), we have 3 algorithms available:
- Linear Regression
- Decision Tree
- Random Forest
- To evaluate an algorithm, check the box next to its name. If you don't want to evaluate an algorithm, uncheck the box.
- Select Run Selected Algorithms.
Assisted Modeling generates the leaderboard, which you can use to compare the performance of the algorithms you've selected.
The output in the leaderboard is different, depending on what kind of problem you're solving, but here are the basics for how to navigate the UI:
- To view info specific to an algorithm, select its card in the Leaderboard section.
- To view info about how an algorithm performed against the other algorithms, select the Comparison tab.
- To view info about an algorithm's individual performance, select the Overview tab.
- To see what features an algorithm valued most, select the Interpretation tab.
- To remind yourself of the choices you made throughout the Assisted Modeling process, select the Configuration tab.
- To start over with Assisted Modeling but retain the info that's in the leaderboard, select Create New Model.
- To show or hide the leaderboard, select Hide Leaderboard or View Leaderboard.
After you've determined what algorithms perform best, you can select which ones you want to add to the Designer canvas as part of a machine-learning pipeline that trains the model:
- Check the box next to the name of the algorithm you want to add to the canvas. You should see a count of how many algorithms you've selected.
- Select Add Models and Continue to Workflow.
You can export a report that contains the results of the model.
- Select the vertical-ellipsis icon.
- From the dropdown, select Export HTML Report.
- Select where you want to save the report using File Explorer.
You can export the code for the machine-learning pipeline to a Jupyter Notebook in a Python tool.
- Select the vertical-ellipsis icon.
- From the dropdown, select Export Model to Python.
- When you exit the Assisted Modeling window, a Python tool appears in the workflow. It contains an annotated Jupyter Notebook with all the code for the machine-learning pipeline.