Stepwise Tool

The Stepwise tool determines the best predictor variables to include in a model out of a larger set of potential predictor variables for linear, logistic, and other traditional regression models.

There are two basic approaches used in implementing stepwise regression. The first (known as backward selection) involves the use of a model that includes all variables that are thought to potentially influence the target variable, and then sequentially removes the least important variable from the original model based on a goodness of fit measure that adjusts for the number of variables included in the model. This process continues, with other variables being removed in subsequent backward steps, until the there are no further improvements in the adjusted fit measure. The second basic approach (known as forward variable selection) starts with a model that includes only a constant, and then adds to that model one variable out of the set of potential variables that provides the greatest improvement in the adjusted fit measure. This process is repeated to add additional variables using a set of additional forward steps, which ends when there is no further improvement in the adjusted fit measure. In the base of backward variables selection, a variable that is removed never re-enters in subsequent steps, while in forward selection a variable is never removed in later steps once it has been added. A hybrid approach can be used that starts with a large ("maximal") initial model and a first backward step, but then both forward and backward movements are evaluated in each subsequent step.

The Alteryx R-based stepwise regression tool makes use of both backward variable selection and mixed backward and forward variable selection. To use the tool, first create a "maximal" regression model that includes all of the variables you believe could matter, and then use the stepwise regression tool to determine which of these variables should be removed based on an adjusted fit measure. A choice of two different adjusted fit measures are provided to the user, the Akaike information criterion** (or AIC) and the Bayesian information criterion*** (or BIC). These two measures are similar to one another, but the BIC places a larger penalty on the number of variables included in the model, typically resulting in a final model with fewer variables than is the case when the AIC is used.

With this tool, if the input data is from a regular Alteryx data stream, then the applicable open source R function is used for model estimation. If the input comes from either an  XDF Output Tool or XDF Input Tool tool, then the appropriate Revo ScaleR function is used for model estimation. The advantage of using the Revo ScaleR based function is that it allows much larger (out of memory) datasets to be analyzed, but with the inability to create some of the model diagnostic output that is available with the open source R functions.

This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.

Inputs

Configuration Properties

Graphics Options

Output

*https://en.wikipedia.org/wiki/Stepwise_regression
**https://en.wikipedia.org/wiki/Akaike_information_criterion
***https://en.wikipedia.org/wiki/Bayesian_information_criterion