Predictive Analytics

Version:
Current
Last modified: March 26, 2020

Designer includes a suite of predictive tools that use R, an open-source code base used for statistical and predictive analysis.

The tools cover data exploration, specialized elements of data preparation for predictive analytics, predictive modeling, tools to compare and assess the efficacy of different models, tools to group records and fields in systematic ways, and tools to help in deploying predictive analytics solutions.

The predictive tools use the R programming language. Go to Options > Download Predictive Tools and sign in to the Alteryx Downloads and Licenses portal to install R and the packages used by the R Tool.

In-Database Support for Predictive Analytics Tools

There are six predictive tools that have in-database support.

When a predictive tool with in-database support is placed on the canvas with another In-DB tool, the predictive tool automatically changes to the In-DB version. To change the version of the tool, right-click the tool, point to Choose Tool Version, and select a different version of the tool. See In-Database Overview for more information about in-database support and tools.

  Microsoft SQL Server 2016 Oracle Teradata
Boosted Model Tool Yes    
Decision Tree Tool Yes    
Forest Model Tool Yes    
Linear Regression Tool Yes Yes Yes
Logistic Regression Tool Yes Yes Yes
Score Tool Yes Yes Yes

Predictive Analytics Tools

Data Investigation Tools

This tool category contains tools for both better understanding the data to be used in a predictive analytics project, and tools for doing specialized data sampling tasks for predictive analytics. The tools to better understand the data being used in a predictive analytics project include both visualization tools and tools that provide tables of descriptive statistics.

The tools that help a user to better understand the data to be analyzed using visual methods are:

Link
Field Summary Tool Icon

Field Summary Tool

Link
Heat Plot Tool Icon

Heat Plot Tool

Link
Histogram Tool Icon

Histogram Tool

Link
Plot of Means Tool Icon

Plot of Means Tool

Link
Scatterplot Tool Icon

Scatterplot Tool

Link
Violin Plot Tool Icon

Violin Plot Tool

The tools that provide useful summary statistics to help the user to better understand the data being analyzed are:

Link
Magnifying glass over a rain drop and umbrella symbol

Association Analysis Tool

Link
Basic Data Profile Tool Icon

Basic Data Profile Tool

Link
Contingency Table Tool Icon

Contingency Table Tool

Link
Distribution Analysis Tool Icon

Distribution Analysis Tool

Link
Frequency Table Tool Icon

Frequency Table Tool

Link
Importance Weights Tool Icon

Importance Weights Tool

Link
Pearson Correlation Tool Icon

Pearson Correlation Tool

Link
Spearman Correlation Tool Icon

Spearman Correlation Tool

Predictive Tools

This category includes tools for general predictive modeling for both classification (categorical target field) and regression (numeric target field) models, as well as tools for model comparison and for hypothesis testing that is relevant for predictive modeling. The set of tools for general predictive modeling can be further broken down into traditional statistical models, and more modern statistical learning methods. A single Score tool provides a mechanism for obtaining model predictions from both types of general predictive modeling tools.

An important distinction between the traditional statistical models and the more modern statistical learning methods is the level of direct user intervention in the modeling process. The traditional statistical models require a much greater level of user intervention and expertise in order to develop a model with an adequate level of predictive efficacy. Specifically, the user must pre-select the important predictor fields, and will likely need to apply appropriate transformations to the numeric fields to capture non-linear effects between the target field and the continuous predictors. Selecting the important predictors (ignoring possible problems due to non-linear relationships) can be assisted through the use of stepwise regression for the traditional models. In contrast, the modern statistical learning methods make use of algorithms that internally address both predictor selection and possible non-linear relationships between the target and numeric predictors.

The traditional statical models differ from one another based on the nature of the target field that is being predicted. All of them are based on estimating (generalized) linear models. While all of the statistical learning algorithms do have the same property of internally handling predictor selection and non-linear effects, they do differ in their approaches. As a result, no single method outperforms all others across the set of problems a user might encounter.

Tools for Traditional Statistical Models

Link
Count Regression Tool Icon

Count Regression Tool

Link
Gamma Regression Tool Icon

Gamma Regression Tool

Link
Linear Regression Tool Icon

Linear Regression Tool

Link
Logistic Regression Icon

Logistic Regression Tool

Link
Naive Bayes Classifier Tool Icon

Naive Bayes Classifier Tool

Link
Neural Network Tool Icon

Neural Network Tool

Link
Stepwise Tool Icon

Stepwise Tool

Link
Support Vector Machine Tool Icon

Support Vector Machine Tool

Tools for the Modern Statistical Learning Method

Link
Boosted Model Tool Icon

Boosted Model Tool

Link
Decision Tree Tool Icon

Decision Tree Tool

Link
Forest Model Tool Icon

Forest Model Tool

Link
Spline Model Tool Icon

Spline Model Tool

Tools for Predictive Model Comparison and Hypothesis Testing

Link
Cross-Validation Tool Icon

Cross-Validation Tool

Link
Lift Chair Tool Icon

Lift Chart Tool

Link
Model Coefficients Tool Icon

Model Coefficients Tool

Link
Model Comparison Tool Icon

Model Comparison Tool

Link
Nested Test Tool Icon

Nested Test Tool

Link
Test of Means Tool Icon

Test of Means Tool

Link
Variance Inflation Factors Tool Icon

Variance Inflation Factors Tool

Tool for Predicting Values for All General Predictive Modeling Tools

Link
Score Tool

Score Tool

Tool for Creating Interactive Network Visualizations and Key Summary Statistics

Link
Network Analysis Tool Icon

Network Analysis Tool

Tools for Generating Survival Models and Estimating Relative Risk and Restricted Mean Survival Time

Link
Survival Analysis Tool Icon

Survival Analysis Tool

Link
Survival Score Tool Icon

Survival Score Tool

AB Testing Tools

The AB Testing tools assist the user in carrying out A/B Testing (also known as test and learn) experiments, such as examining the effect of a new marketing communications campaign on sales, or the effect of changing store staffing levels. The tools can help in determining market areas for a test (usually for one that involves mass media advertising where everyone residing in that area can potentially be exposed to the advertising), matching one or more control units to each treatment unit, developing trend and seasonality measures upon which the matching of controls to treatments is often based, and doing the actual analysis of the experimental results. The tools associated with this subcategory are:

Link
AB Analysis Tool Icon

AB Analysis Tool

Link
AB Controls Tool Icon

AB Controls Tool

Link
AB Treatments Tool Icon

AB Treatments Tool

Link
AB Trend Tool Icon

AB Trend Tool

Time Series Tools

This category contains a number of regular (in terms of the data time interval, such as monthly), univariate times series plotting and forecasting tools. Central among these are tools for creating ARIMA and extended exponential smoothing forecasting models which can be used to create items such as a weekly sales forecasting model. Both of these methods develop forecasts based on systematic, time related elements in the values of the target variable. Specifically, they pick up elements of trend (longer term, fairly consistent upward or downward movement in the target variable) and seasonality (cyclical patterns that repeat over time).

To provide a concrete example of these elements, a time series model of tablet computer sales would likely reveal a positive trend in sales along with a strong seasonal pattern of higher sales near Christmas and before the start of the school year. If neither trend or seasonality is present in the target variable, the forecast values of the target variable are likely to fall on a straight line based on the weighted mean value of the target for the most recent values of the target. This is likely to be an unsatisfying finding for a user, but it indicates that there is no real structure in the data with respect to only time related elements (trend and seasonality). In these cases, more general predictive modeling methods may be more useful in developing forecasts than the time series tools.

In addition to tools for creating forecasts, there are tools to help the user compare the relative efficacy of different times series forecasting models. The complete set of time series tools includes:

Link
ARIMA Tool Icon

ARIMA Tool

Link
ETS Tool Icon

ETS Tool

Link
TS Compare Tool Icon

TS Compare Tool

Link
TS Covariate Forecast Tool Icon

TS Covariate Forecast Tool

Link
TS Filler Tool Icon

TS Filler Tool

Link
TS Forecast Tool Icon

TS Forecast Tool

Link
TS Forecast Factory Tool Icon

TS Forecast Factory Tool

Link
TS Model Factory Tool Icon

TS Model Factory Tool

Link
TS Plot Tool Icon

TS Plot Tool

Predictive Grouping Tools

This category contains tools to group either records or fields into a smaller number of groups. Common applications for grouping records together is to create customer segments based on purchasing patterns or creating a set of store groups. The ultimate objective of grouping in these two areas is to create a smaller number of groups that allows for the customization of programs and activities in a way that is feasible from a business perspective.

For example, a retailer that has 500 outlets in their network would likely find it overwhelming to develop a merchandising and pricing program that was specific for each of the 500 outlets. However, if the outlets are placed into a smaller set of store groups (say 10) based on the similarity of the outlets with respect to their sales patterns, creating 10 different merchandising and pricing programs is something the retailer can successfully implement. Similarly, many organizations have database tables they wish to analyze that are very wide, with many of the fields highly correlated with one another. In these cases, dealing with a large number of highly correlated measures greatly complicates any analyses done with these data. As a result, it may make sense to reduce the original set of fields into a smaller set of composite fields that more readily lend themselves to analysis. In both these instances, there is a need to reduce the dimensionality of the data to make it actionable.

The most common method used to group records together is cluster analysis. There are actually many different types of cluster analysis, but by far the most commonly used clustering methods in business applications are based on K-Centroids algorithms. Alteryx provides tools to help determine the appropriate number of clusters (groups) that should be formed, creating the final set of clusters, and appending the cluster a particular record belongs to (regardless of whether the record was used in determining the set of clusters) to the data. A related tool (Find Nearest Neighbors) allows the user to forms ad hoc groups of a given size around one or more specific records. For instance, the tools provides the user with the ability to find the five customers most like customer "X" based on past purchase behavior. The method available for grouping fields is principal components.

The Market Basket Analysis tools help determine what items go together in point of sales data, or the combination of problems tend to co-occur in defect reporting and work order systems. The tools in the category determine the set of "rules" in the data (such as "Product defect A is more likely to be present when product defects B and C are also observed"), and provide filtering tools to help narrow down the list of possible rules based on a set of criteria that are associated with rules that are more likely to make them practically more important.

Tools in this category include:

Link
Append Cluster Tool Icon

Append Cluster Tool

Link
Find Nearest Neighbors Tool Icon

Find Nearest Neighbors Tool

Link
K-Centroids Cluster Analysis Tool Icon

K-Centroids Cluster Analysis Tool

Link
K-Centroids Diagnostics Tool Icon

K-Centroids Diagnostics Tool

Link
MB Affinity Tool Icon

MB Affinity Tool

Link
MB Inspect Tool Icon

MB Inspect Tool

Link
MB Rules Tool

MB Rules Tool

Link
Multidimensional Scaling Tool Icon

Multidimensional Scaling Tool

Link
Principal Components Tool Icon

Principal Components Tool

Prescriptive Tools

This category includes tools that can assist with determining the best course of action or outcome for a particular situation or set of scenarios. It can help augment the output of predictive models by prescribing an optimal action.

Link
Optimization Tool Icon

Optimization Tool

Link
Simulation Sampling Tool Icon

Simulation Sampling Tool

Link
Simulation Scoring Tool Icon

Simulation Scoring Tool

Link
Simulation Summary Tool Icon

Simulation Summary Tool

Was This Helpful?

Running into problems or issues with your Alteryx product? Visit the Alteryx Community or contact support.