Evaluate Model

Last modified: September 14, 2022

During the Evaluate Model stage, you apply holdout data, review general information about the model pipeline, assess performance, gain insights, and simulate potential cases your model might encounter in production.

To use holdout data we’ve prepared for you, built from the original data you provided, select Evaluate Model.

You must apply your holdout data by selecting Evaluate Model before you can see evaluations specific to your model in the GeneralPerformanceInsights, and Simulations panels.


The General panel contains information that allows you to review your model at a high level.

  • The validation and holdout scores show how well the model has performed when scored against the validation and holdout data, based on the ranking metric you've selected.
  • The positive or negative score shows how much better or worse your model performed when compared with a baseline model, which simulates random guessing.
  • The Pipeline shows the different operations we used to build the model, along with the sequence of those operations. You can drag elements in the visualization to better understand them.


The Performance panel shows you in-depth information about how well your model performed.


For classification problems, the Metrics panel shows you a comparison of the how the model performed based on different ranking metrics. To learn more about specific ranking metrics, select the book icon in the header or Learn More from the Performance panel

The panel also displays a confusion matrix. The confusion matrix shows how frequently an algorithm's predicted values match the actual values in the training data. Use the confusion matrix to identify what categories the model accurately predicts.

For binary classification problems, you can see an ROC Curve. The ROC (receiver operating characteristic) curve shows how well your model performs compared to a random guess. Use ROC with AUC (the area under the curve) measure to identify how well your model makes predictions across classification thresholds. The AUC measure ranges from 0 to 1. 0 means all model predictions are incorrect. 1 means all model predictions are correct.


For regression problems, the Metrics panel shows you a comparison of how the model performed based on different ranking metrics. To learn more about specific ranking metrics, select the book icon in the header or Learn More from the Performance panel.

Time series regression is an experimental feature. Experimental features offer early access to features. We test and update experimental features on a regular basis.


The Insights panel allows you to figure out what features matter most to your model.

Feature Importance

We measure the importance of each feature by evaluating features against the holdout data. Use this measure to determine what features are most important. You can also use this measure to identify features that could put your model at risk of generalization error by associating too weakly or too strongly with the target.

Partial Dependence

The Partial Dependence plot uses the trained model to show the association between the feature you select and the target. For numeric features, the plot shows what happens to the target as the value of the feature changes. For categorical features, the plot shows the association between the categories and the target. Use the plot to learn what kinds of associations individual features have with the target. Switch from Absolute to Relative to make the visualization fit the data, rather than display the whole plot.

Prediction Explanations

The Prediction Explanations panel tells you how the feature values for a single row explain the prediction. The panel shows you some representative rows along with the most important features for each of those rows. Different combinations of features can influence each prediction.

Note: Currently, Prediction Explanations are only available for binary classification problems.


The Simulations panel allows you to choose a row of data, then manipulate different features to see how they affect the prediction the model makes for that row.

You can select a specific row by its number, or you can have us pick a Random row for you.

Make sure to select Run each time you make a change to a feature.

Was This Page Helpful?

Running into problems or issues with your Alteryx product? Visit the Alteryx Community or contact support. Can't submit this form? Email us.