You are here: Predictive Analytics > Data Investigation > Distribution Analysis

Distribution Analysis Tool

The Distribution Analysis tool allows you to fit one or more distributions to the input data and compare them based on a number of Goodness-of-Fit* statistics. Based on the statistical significance (p-values) of the results of these tests, the user can determine which distribution best represents the data.

The Distribution Analysis tool can be helpful when trying to understand the overall nature of your data as well as make decisions about how to analyze it. For instance, data that fits a Normal distribution would likely be well-suited to a Linear Regression, while data that is Gamma Distributed might be better-suited to analysis via the Gamma Regression tool.

This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.

Input

An Alteryx data stream with continuous data.

Configuration Properties

Configuration

  1. Select a field for analysis: Select a field from the incoming data for analysis.
  2. Select distributions for comparison: Select one or more distributions to compare. The distribution options are as follows:
  3. Columns containing unique identifiers, such as surrogate primary keys and natural primary keys, should not be used in statistical analyses. They have no predictive value and can cause runtime exceptions.

The Lognormal, Weibull, and Gamma distributions ONLY work for non-negative data.

Graphics Options

Output

A set of report snippets that includes a histogram, basic summary statistics of the test results, goodness of fit statistics, data quantiles per distribution, and the distribution parameters.

*D'Agostino, R., Stephens, M.A. (1986) Goodness of Fit Techniques.