The Distribution Analysis tool allows you to fit one or more distributions to the input data and compare them based on a number of Goodness-of-Fit* statistics. Based on the statistical significance (p-values) of the results of these tests, the user can determine which distribution best represents the data.
The Distribution Analysis tool can be helpful when trying to understand the overall nature of your data as well as make decisions about how to analyze it. For instance, data that fits a Normal distribution would likely be well-suited to a Linear Regression, while data that is Gamma Distributed might be better-suited to analysis via the Gamma Regression tool.
This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.
An Alteryx data stream with continuous data.
Columns containing unique identifiers, such as surrogate primary keys and natural primary keys, should not be used in statistical analyses. They have no predictive value and can cause runtime exceptions.
The Lognormal, Weibull, and Gamma distributions ONLY work for non-negative data.
Graph resolution: Select the resolution of the graph in dots per inch: 1x (96 dpi); 2x (192 dpi); or 3x (288 dpi). Lower resolution creates a smaller file and is best for viewing on a monitor. Higher resolution creates a larger file with better print quality.
A set of report snippets that includes a histogram, basic summary statistics of the test results, goodness of fit statistics, data quantiles per distribution, and the distribution parameters.
*D'Agostino, R., Stephens, M.A. (1986) Goodness of Fit Techniques.
©2018 Alteryx, Inc., all rights reserved. Allocate®, Alteryx®, Guzzler®, and Solocast® are registered trademarks of Alteryx, Inc.