Importance Weights Tool
The Importance Weight tool provides methods for selecting a set of variables to use in a predictive model based on how strongly related each possible predictor is to the target variable of a model to be created.
The final set selected can be based on taking the N most strongly related predictors to the target, or by selecting a cutoff importance weight level, and only those variables that exceed the cutoff point are included in a model.
On drawback to this approach is that it only looks at the strength of a possible predictor on the target in isolation, ignoring possible interaction effects and correlation between predictors. Despite this limitation, this type of variable filtering method is frequently used in practice.
There are a number of different importance weights measures, and the applicability of a particular method typically depends on both the type of target and the predictor (numeric or categorical). One drawback to this situation is that measures used to determine the relative importance of different possible predictors will be different for numeric and categorical variables. The exception is the Relief method, but its performance is not as robust as other methods that are specific to a particular target type and predictor type combination.
Most of the measures are provided by the FSelector R package. This package makes use of some methods written in Java, so to use this macro, you will need to have a Java 7 runtime environment on the machine where Alteryx is installed.
Gallery tool
This tool is not automatically installed with Alteryx Designer or the R tools. To use this tool, download it from the Alteryx Analytics Gallery.
Connect an input
An Alteryx data stream containing both the desired target variable and a set of potential predictor variables that will be used to estimate a predictive model.
Configure the tool
- Continuous target: Select this option if the target variable you want to predict is a numeric variable. When you select this option you will be asked to select the target variable field from the data, and whether you want to examine which possible continuous (numeric) or categorical (string variables with category labels) you wish to consider. Once you have made this selection, you will need to select the set of predictors (of the selected type) you want to examine and one or more comparison measures. For continuous target and continuous predictors the available measures are:
- Pearson correlation
- Spearman (rank order) correlation
- Relief, which provides the RRELIEFF algorithm. The use can select both the number of near neighbors (Neighbor's count) and the sample size (Sample size) used to calculate the RRELIEFF measure.
- Conditional mean (Pearson) correlation. This measure is based on calculating the mean level of the target variable for each level (category) of the categorical variables, and then calculating the Pearson correlation between the actual values and the mean values
- Relief, which uses the RRELIEFF algorithm. The use can select both the number of near neighbors (Neighbor's count) and the sample size (Sample size) used to calculate the RRELIEFF measure.
- Categorical target: Select this option if the target variable you want to predict is a categorical variable. When you select this option you will be asked to select the target variable field from the data, and whether you want to examine which possible continuous (numeric) or categorical (string variables with category labels) you wish to consider. Once you have made this selection, you will need to select the set of predictors (of the selected type) you want to examine and one or more comparison measures. For continuous target and continuous predictors the available measures are:
- Entropy information gain
- Entropy gain ratio
- Entropy symmetric uncertainty
- Relief, which uses the RRELIEFF algorithm. The use can select both the number of near neighbors (Neighbor's count) and the sample size (Sample size) used to calculate the RRELIEFF measure.
- Cramer's V (chi-squared)
- Relief, which uses the RRELIEFF algorithm. The use can select both the number of near neighbors (Neighbor's count) and the sample size (Sample size) used to calculate the RRELIEFF measure.
The available importance weight measures available for a continuous target and categorical predictors are:
Columns containing unique identifiers, such as surrogate primary keys and natural primary keys, should not be used in statistical analyses. They have no predictive value and can cause runtime exceptions.
The available importance weight measures available for a categorical target and categorical predictors are:
View the output
- D anchor: Consists of a table that provides the selected importance weight value for each potential predictor.
- R anchor: Consists of report snippets that indicate the target field (and its type) and the type of the potential predictor fields along with the table of the selected importance weight value for each potential predictor.