Predictive Analytics

Alteryx Designer includes a suite of predictive tools that use R, an open-source code base used for statistical and predictive analysis.

The tools cover data exploration, specialized elements of data preparation for predictive analytics, predictive modeling, tools to compare and assess the efficacy of different models, tools to group records and fields in systematic ways, and tools to help in deploying predictive analytics solutions.

The predictive tools use the R programming language. Go to Options > Download Predictive Tools and sign in to the Alteryx Downloads and Licenses portal to install R and the packages used by the R tool.

In-Database Support

There are six predictive tools that have in-database support.

When a predictive tool with in-database support is placed on the canvas with another In-database (In-DB) tool, the predictive tool automatically changes to the In-DB version. To change the version of the tool, right-click the tool, point to Choose Tool Version, and select a different version of the tool. See In-Database Overview for more information about in-database support and tools.

	Microsoft SQL Server 2016	Oracle	Teradata
Boosted Model Tool	Yes
Decision Tree Tool	Yes
Forest Model Tool	Yes
Linear Regression Tool	Yes	Yes	Yes
Logistic Regression Tool	Yes	Yes	Yes
Score Tool	Yes	Yes	Yes

Predictive Analytics Tools

Data Investigation Tools

This tool category contains tools for better understanding the data to be used in a predictive analytics project and for performing specialized data sampling tasks for predictive analytics. The tools to better understand the data being used in a predictive analytics project include visualization tools and tools that provide tables of descriptive statistics.

The tools that help a user to better understand the data to be analyzed using visual methods are:

Field Summary Tool

Heat Plot Tool

Histogram Tool

Plot of Means Tool

Scatterplot Tool

Violin Plot Tool

The tools that provide useful summary statistics to help the user to better understand the data being analyzed are:

Association Analysis Tool

Magnifying glass over a rain drop and umbrella symbol

Basic Data Profile Tool

Contingency Table Tool

Distribution Analysis Tool

Frequency Table Tool

Importance Weights Tool

Pearson Correlation Tool

Spearman Correlation Tool

Predictive Tools

This category includes tools for general predictive modeling for both classification (categorical target field) and regression (numeric target field) models, as well as tools for model comparison and for hypothesis testing that is relevant for predictive modeling. The set of tools for general predictive modeling can be further divided into traditional statistical models, and more modern statistical learning methods. A single Score tool provides a mechanism for obtaining model predictions from both types of general predictive modeling tools.

An important distinction between the traditional statistical models and the more modern statistical learning methods is the level of direct user intervention in the modeling process. The traditional statistical models require a much greater level of user intervention and expertise in order to develop a model with an adequate level of predictive efficacy. Specifically, the user must pre-select the important predictor fields, and will likely need to apply appropriate transformations to the numeric fields to capture non-linear effects between the target field and the continuous predictors. Selecting the important predictors (ignoring possible problems due to non-linear relationships) can be assisted through the use of stepwise regression for the traditional models. In contrast, modern statistical learning methods make use of algorithms that internally address both predictor selection and possible non-linear relationships between the target and numeric predictors.

Traditional statistical models differ from one another based on the nature of the target field being predicted. All of them are based on estimating (generalized) linear models. While all statistical learning algorithms have the same property of internally handling predictor selection and non-linear effects, they differ in their approaches. As a result, no single method outperforms all others across the set of problems a user might encounter.

Tools for Traditional Statistical Models

Count Regression Tool

Gamma Regression Tool

Linear Regression Tool

Logistic Regression Tool

Naive Bayes Classifier Tool

Neural Network Tool

Stepwise Tool

Support Vector Machine Tool

Tools for the Modern Statistical Learning Method

Boosted Model Tool

Decision Tree Tool

Forest Model Tool

Spline Model Tool

Tools for Predictive Model Comparison and Hypothesis Testing

Cross-Validation Tool

Lift Chart Tool

Model Coefficients Tool

Model Comparison Tool

Nested Test Tool

Test of Means Tool

Variance Inflation Factors Tool

Tool for Predicting Values for All General Predictive Modeling Tools

Score Tool

Tool for Creating Interactive Network Visualizations and Key Summary Statistics

Network Analysis Tool

Tools for Generating Survival Models and Estimating Relative Risk and Restricted Mean Survival Time

Survival Analysis Tool

Survival Score Tool

AB Testing Tools

The AB Testing tools assist the user in carrying out A/B Testing (also known as test and learn) experiments, such as examining the effect of a new marketing communications campaign on sales, or the effect of changing store staffing levels. The tools can help in determining market areas for a test (usually for one that involves mass media advertising where everyone residing in that area can potentially be exposed to the advertising), matching one or more control units to each treatment unit, developing trend and seasonality measures upon which the matching of controls to treatments is often based, and doing the actual analysis of the experimental results. The tools associated with this subcategory are:

AB Analysis Tool

AB Controls Tool

AB Treatments Tool

AB Trend Tool

Time Series Tools

This category contains a number of regular (in terms of the data time interval, such as monthly), univariate times series plotting, and forecasting tools. Central among these are tools for creating ARIMA and extended exponential smoothing forecasting models which can be used to create items such as a weekly sales forecasting model. Both of these methods develop forecasts based on systematic, time-related elements in the values of the target variable. Specifically, they pick up elements of trend (longer-term, fairly consistent upward or downward movement in the target variable) and seasonality (cyclical patterns that repeat over time).

To provide a concrete example of these elements, a time series model of tablet computer sales would likely reveal a positive trend in sales along with a strong seasonal pattern of higher sales near Christmas and before the start of the school year. If neither trend nor seasonality is present in the target variable, the forecast values of the target variable are likely to fall on a straight line based on the weighted mean value of the target for the most recent values of the target. This is likely to be an unsatisfying finding for a user, but it indicates that there is no real structure in the data with respect to only time-related elements (trend and seasonality). In these cases, more general predictive modeling methods may be more useful in developing forecasts than the time series tools.

In addition to tools for creating forecasts, there are tools to help the user compare the relative efficacy of different time series forecasting models. The complete set of time series tools includes:

ARIMA Tool

ETS Tool

TS Compare Tool

TS Covariate Forecast Tool

TS Filler Tool

TS Forecast Tool

TS Forecast Factory Tool

TS Model Factory Tool

TS Plot Tool

Predictive Grouping Tools

This category contains tools to group records or fields into fewer groups. Common applications for grouping records together are creating customer segments based on purchasing patterns or creating a set of store groups. The ultimate objective of grouping in these two areas is to create a smaller number of groups that allow for the customization of programs and activities in a way that is feasible from a business perspective.

For example, a retailer that has 500 outlets in their network would likely find it overwhelming to develop a merchandising and pricing program that was specific for each of the 500 outlets. However, if the outlets are placed into a smaller set of store groups (say 10) based on the similarity of the outlets with respect to their sales patterns, creating 10 different merchandising and pricing programs is something the retailer can successfully implement. Similarly, many organizations have database tables they wish to analyze that are very wide, with many of the fields highly correlated with one another. In these cases, dealing with a large number of highly correlated measures greatly complicates any analyses done with these data. As a result, it may make sense to reduce the original set of fields into a smaller set of composite fields that more readily lend themselves to analysis. In both these instances, there is a need to reduce the dimensionality of the data to make it actionable.

The most common method used to group records together is cluster analysis. There are actually many different types of cluster analysis, but by far the most commonly used clustering methods in business applications are based on K-Centroids algorithms. Alteryx provides tools to help determine the appropriate number of clusters (groups) that should be formed, creating the final set of clusters, and appending the cluster a particular record belongs to (regardless of whether the record was used in determining the set of clusters) to the data. A related tool (Find Nearest Neighbors) allows the user to form ad hoc groups of a given size around one or more specific records. For instance, the tools provide the user with the ability to find the five customers most like customer "X" based on past purchase behavior. The method available for grouping fields is principal components.

The Market Basket Analysis tools help determine what items go together in point of sales data, or the combination of problems tends to co-occur in defect reporting and work order systems. The tools in the category determine the set of "rules" in the data (such as "Product defect A is more likely to be present when product defects B and C are also observed") and provide filtering tools to help narrow down the list of possible rules based on a set of criteria that are associated with rules that are more likely to make them practically more important.

Tools in this category include: