Gamma Regression Tool
The Gamma Regression tool relates a gamma distributed, strictly positive variable of interest (target variable) to one or more variables (predictor variables) that are expected to have an influence on the target variable.
In a number of applications, the values of the target variable are always strictly positive (i.e., are never zero or negative), but tend to cluster toward the lower range of the observed values, but in a small minority of cases take on large values. Target variables of this nature represent a data generation process that is not consistent with the Normality assumptions underlying the traditional linear regression model. However, the values are always positive and will not always be integer numbers, so they do not follow a Poisson distribution or Negative Binomial distribution based process. They are consistent with a process based on a Gamma distribution, and can be estimated using methods similar to linear regression, via the generalized linear model framework.
With this tool, if the input data is from a regular Alteryx data stream, then the open source R glm function is used for model estimation. If the input comes from either an XDF Input Tool or XDF Output Tool, then the Revo ScaleR rxGlm function is used for model estimation. The advantage of using the Revo ScaleR based function is that it allows much larger (out of memory) datasets to be analyzed, but at the cost of additional overhead to create an XDF file and with the inability to create some of the model diagnostic output that is available with the open source R functions.
This tool uses the R tool. Go to Options > Download Predictive Tools and sign in to the Alteryx Downloads and Licenses portal to install R and the packages used by the R Tool. See Download and Use Predictive Tools.
Connect an input
An Alteryx data stream or XDF metadata stream that includes a target field of interest along with one or more possible predictor fields.
Configure the tool
Use the Configuration tab to set the controls for your Gamma regression.
- Model name: Each model needs to be given a name so it can later be identified. Model names must start with a letter and may contain letters, numbers, and the special characters period (".") and underscore ("_"). No other special characters are allowed, and R is case sensitive.
- Select the target variable: Select the field from the data stream you want to predict.
- Select the predictor variables: Choose the fields from the data stream you believe "cause" changes in the value of the target variable.
- Model type: A drop down box with the options of log, inverse, and identity. This option determines the link function to be used with the Gamma family in estimating the generalized linear model.
- Use sampling weights in model estimation? (Optional)...: Click the check box and then select a weight field from the data stream to estimate a model that uses sampling weight.
Columns containing unique identifiers, such as surrogate primary keys and natural primary keys, should not be used in statistical analyses. They have no predictive value and can cause runtime exceptions.
Use the Graphics Options tab to set the controls for the graphical output.
-
Graph resolution: Select the resolution of the graph in dots per inch: 1x (96 dpi); 2x (192 dpi); or 3x (288 dpi). Lower resolution creates a smaller file and is best for viewing on a monitor. Higher resolution creates a larger file with better print quality.
View the output
- O anchor: Consists of a table of the serialized model with its model name.
- R anchor: Consists of the report snippets generated by the Gamma Regression tool: a statistical summary, a Type II Analysis of Deviance (ANOD), and Basic Diagnostic Plots. The Type II Analysis of Deviance table and the Basic Diagnostic Plots are not produced when the model input comes from a XDF Output or XDF Input tool.