Predictive Analytics

Designer includes a suite of predictive tools that use R, an open-source code base used for statistical and predictive analysis.

The tools cover data exploration, specialized elements of data preparation for predictive analytics, predictive modeling, tools to compare and assess the efficacy of different models, tools to group records and fields in systematic ways, and tools to help in deploying predictive analytics solutions.

The predictive tools use the R programming language. Go to Options > Download Predictive Tools and sign in to the Alteryx Downloads and Licenses portal to install R and the packages used by the R Tool.

In-Database Support for Predictive Analytics Tools

There are six predictive tools that have in-database support.

When a predictive tool with in-database support is placed on the canvas with another In-DB tool, the predictive tool automatically changes to the In-DB version. To change the version of the tool, right-click the tool, point to Choose Tool Version, and click a different version of the tool. See In-Database Overview for more information about in-database support and tools.

	Microsoft SQL Server 2016	Oracle	Teradata
Boosted Model Tool	Yes
Decision Tree Tool	Yes
Forest Model Tool	Yes
Linear Regression Tool	Yes	Yes	Yes
Logistic Regression Tool	Yes	Yes	Yes
Score Tool	Yes	Yes	Yes

Predictive Analytics Tools

Data Investigation

This tool category contains tools for both better understanding the data to be used in a predictive analytics project, and tools for doing specialized data sampling tasks for predictive analytics. The tools to better understand the data being used in a predictive analytics project include both visualization tools and tools that provide tables of descriptive statistics.

The tools that help a user to better understand the data to be analyzed using visual methods are:

Field Summary
Heat Plot
Histogram
Plot of Means
Scatterplot
Violin Plot

The tools that provide useful summary statistics to help the user to better understand the data being analyzed are:

Association Analysis
Basic Data Profile
Contingency Table
Distribution Analysis
Frequency Table
Importance Weights
Pearson Correlation
Spearman Correlation

Predictive

This category includes tools for general predictive modeling for both classification (categorical target field) and regression (numeric target field) models, as well as tools for model comparison and for hypothesis testing that is relevant for predictive modeling. The set of tools for general predictive modeling can be further broken down into traditional statistical models, and more modern statistical learning methods. A single Score tool provides a mechanism for obtaining model predictions from both types of general predictive modeling tools.

An important distinction between the traditional statistical models and the more modern statistical learning methods is the level of direct user intervention in the modeling process. The traditional statistical models require a much greater level of user intervention and expertise in order to develop a model with an adequate level of predictive efficacy. Specifically, the user must pre-select the important predictor fields, and will likely need to apply appropriate transformations to the numeric fields to capture non-linear effects between the target field and the continuous predictors. Selecting the important predictors (ignoring possible problems due to non-linear relationships) can be assisted through the use of stepwise regression for the traditional models. In contrast, the modern statistical learning methods make use of algorithms that internally address both predictor selection and possible non-linear relationships between the target and numeric predictors.

The traditional statical models differ from one another based on the nature of the target field that is being predicted. All of them are based on estimating (generalized) linear models. While all of the statistical learning algorithms do have the same property of internally handling predictor selection and non-linear effects, they do differ in their approaches. As a result, no single method outperforms all others across the set of problems a user might encounter.

Tools for traditional statistical models

Count Regression
Gamma Regression
Linear Regression
Logistic Regression
Naïve Bayes Classifier
Neural Network
Stepwise
Support Vector Machine

Tools for the modern statistical learning method

Boosted Model
Decision Tree Model
Forest Model
Spline Model

Tools for predictive model comparison and hypothesis testing

Cross-Validation
Lift Chart (applicable to binary classification models)
Model Coefficients
Model Comparison
Nested Test
Test of Means
Variance Inflation Factors

Tool for predicting values for all general predictive modeling tools

Score

Tool for creating interactive network visualizations and key summary statistics

Network Analysis

Tools for generating survival models and estimating relative risk and restricted mean survival time

Survival Analysis
Survival Score

AB Testing

The AB Testing tools assist the user in carrying out A/B Testing (also known as test and learn) experiments, such as examining the effect of a new marketing communications campaign on sales, or the effect of changing store staffing levels. The tools can help in determining market areas for a test (usually for one that involves mass media advertising where everyone residing in that area can potentially be exposed to the advertising), matching one or more control units to each treatment unit, developing trend and seasonality measures upon which the matching of controls to treatments is often based, and doing the actual analysis of the experimental results. The tools associated with this subcategory are:

AB Analysis
AB Controls
AB Treatments
AB Trend

Time Series

This category contains a number of regular (in terms of the data time interval, such as monthly), univariate times series plotting and forecasting tools. Central among these are tools for creating ARIMA and extended exponential smoothing forecasting models which can be used to create items such as a weekly sales forecasting model. Both of these methods develop forecasts based on systematic, time related elements in the values of the target variable. Specifically, they pick up elements of trend (longer term, fairly consistent upward or downward movement in the target variable) and seasonality (cyclical patterns that repeat over time).

To provide a concrete example of these elements, a time series model of tablet computer sales would likely reveal a positive trend in sales along with a strong seasonal pattern of higher sales near Christmas and before the start of the school year. If neither trend or seasonality is present in the target variable, the forecast values of the target variable are likely to fall on a straight line based on the weighted mean value of the target for the most recent values of the target. This is likely to be an unsatisfying finding for a user, but it indicates that there is no real structure in the data with respect to only time related elements (trend and seasonality). In these cases, more general predictive modeling methods may be more useful in developing forecasts than the time series tools.

In addition to tools for creating forecasts, there are tools to help the user compare the relative efficacy of different times series forecasting models. The complete set of time series tools includes:

ARIMA
ETS (exponential smoothing)
TS Compare
TS Covariate Forecast
TS Filler
TS Forecast
TS Forecast Factory
TS Model Factory
TS Plot

Predictive Grouping

This category contains tools to group either records or fields into a smaller number of groups. Common applications for grouping records together is to create customer segments based on purchasing patterns or creating a set of store groups. The ultimate objective of grouping in these two areas is to create a smaller number of groups that allows for the customization of programs and activities in a way that is feasible from a business perspective.

For example, a retailer that has 500 outlets in their network would likely find it overwhelming to develop a merchandising and pricing program that was specific for each of the 500 outlets. However, if the outlets are placed into a smaller set of store groups (say 10) based on the similarity of the outlets with respect to their sales patterns, creating 10 different merchandising and pricing programs is something the retailer can successfully implement. Similarly, many organizations have database tables they wish to analyze that are very wide, with many of the fields highly correlated with one another. In these cases, dealing with a large number of highly correlated measures greatly complicates any analyses done with these data. As a result, it may make sense to reduce the original set of fields into a smaller set of composite fields that more readily lend themselves to analysis. In both these instances, there is a need to reduce the dimensionality of the data to make it actionable.

The most common method used to group records together is cluster analysis. There are actually many different types of cluster analysis, but by far the most commonly used clustering methods in business applications are based on K-Centroids algorithms. Alteryx provides tools to help determine the appropriate number of clusters (groups) that should be formed, creating the final set of clusters, and appending the cluster a particular record belongs to (regardless of whether the record was used in determining the set of clusters) to the data. A related tool (Find Nearest Neighbors) allows the user to forms ad hoc groups of a given size around one or more specific records. For instance, the tools provides the user with the ability to find the five customers most like customer "X" based on past purchase behavior. The method available for grouping fields is principal components.

The Market Basket Analysis tools help determine what items go together in point of sales data, or the combination of problems tend to co-occur in defect reporting and work order systems. The tools in the category determine the set of "rules" in the data (such as "Product defect A is more likely to be present when product defects B and C are also observed"), and provide filtering tools to help narrow down the list of possible rules based on a set of criteria that are associated with rules that are more likely to make them practically more important.

Tools in this category include:

Append Cluster
Find Nearest Neighbors
K-Centroids Cluster Analysis
K-Centroids Diagnostics
MB Affinity
MB Inspect
MB Rules
Multidimensional Scaling
Principal Components

Prescriptive

This category includes tools that can assist with determining the best course of action or outcome for a particular situation or set of scenarios. It can help augment the output of predictive models by prescribing an optimal action.

Optimization
Simulation Sampling
Simulation Scoring
Simulation Summary