Use AutoML as part of a machine learning pipeline to automatically build a model of your data. The tool provides several algorithms for both classification and regression methods, then evaluates the algorithms against each other before outputting a trained model.
The AutoML tool has 2 anchors.
- Input anchor: The input anchor connects to the data that you want to model using the AutoML tool.
- Output anchor: The output anchor passes the model object with associated performance metrics downstream.
Configure the Tool
To use the AutoML tool, you have to configure options for what target you want to predict and what machine learning method you want to use.
Select an option from the dropdown. The choices include all columns from the data you've input. The data type of each column displays next to its name.
2. Machine Learning Method
The AutoML tool automatically selects the best machine learning method based on the target you've selected. Available machine learning methods are regression and classification. You have the option to manually select the machine learning method.
The regression method solves problems where the goal is to find a trend line in the data, like forecasting GDP growth. You can also use regression algorithms to describe associations between events. For example, you could use this method to find out whether a company’s sales go up in relation to the number of sales people the company employs.
The classification method solves problems where the goal is to figure out what category a piece of data belongs to, like what species a flower is. Classification problems are either binary, having 2 categories, or multiclass, having more than 2 categories. Often, different algorithms are used to solve each kind of classification problem.
Configure Advanced Parameters
The AutoML tool has different options you can configure to change how the tool evaluates algorithms and then selects 1 to build the best machine learning model.
1. Objective Function
Select an objective function to optimize the performance of the model. From the dropdown, select what measure you want the tool to optimize for.
The objective function is what you want to use to determine the ranking of models the tool evaluates. Objective functions are measures you can use to determine how optimal a model is for your use-case.
Select what types of algorithms you want to evaluate as part of the automodeling process. You can select more than 1 option. The more types you select, the longer the workflow takes to run. Check the box next to each algorithm you want to evaluate.
Random Forest: Random-forest algorithms train models using the results of an ensemble of randomly-generated decision trees. The algorithm performs best when modeling nonlinear associations between classes. The ensemble method helps avoid problems of overfitting and underfitting, but is computationally expensive.
XGBoost: XGBoost algorithms train models using the results of an ensemble of randomly-generated decision trees. Due to the algorithm's boosting capability—a method by which decision trees improve each other—it is less susceptible to overfitting and underfitting. The XGBoost algorithm is most useful when you want to use many different features to train the model.
Linear: Linear algorithms train the model by drawing lines through the data, finding the best fit or dividing it into groups. These algorithms perform best when modeling linear associations or trends. They tend to be computationally efficient, but subject to underfitting.
CatBoost: CatBoost algorithms train the model using the results of an ensemble of decision trees. This algorithm uses boosting methods similar to XGBoost, but tends to be even less prone to overfitting. This algorithm is even more computationally expensive than XGBoost.
3. Max Model Pipelines to Evaluate
Enter the number of pipelines you want the AutoML tool to build, using the chosen algorithms, and then evaluate them, based on the objective function. You can evaluate 1–50 pipelines.
4. Enable Data Checks
To enable data checks, check the box in this section. We use the default data checks from EvalML.