Association Analysis Tool

The Association Analysis tool allows a user to determine which fields in a database have a bivariate association with one another. The assessment can be based on either Pearson product-moment ("regular") correlation coefficients,* Spearman rank-order correlation coefficients,** or Hoeffding's D statistics*** (a non-parametric test that can find non-monotonic relationships such as inverted U-shapes). In addition, the statistical significance of each association measure is determined.

The tool always provides the full set of relationships, and optionally can provide an in depth analysis of a target field of interest and its relationship to other numeric variables. The target field of interest can either be a numeric variable or a binary categorical variable. If a binary categorical variable is used as the target field, then it is converted to a zero-one numeric field with the value one imputed in cases where the field has a level that corresponds to a target level, and a zero value is imputed otherwise.

This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.

Input

The Association Analysis tool accepts input from an Alteryx data stream.

Configuration Properties

  1. Target a field for more detailed analysis: This allows a user to run a more focused analysis of one field of interest and other fields in the data. This is particularly useful if the goal of the analysis is to determine the set of fields to use in a subsequent predictive model. If this option is selected, the user needs to provide the name of the target field, which can be either numeric or binary categorical. If the field is binary categorical, the user needs to select the value of this field that will be re coded as one, with the other value coded as zero. If the provided field is categorical, and contains more than two different values, an error will be returned.
  2. Columns containing unique identifiers, such as surrogate primary keys and natural primary keys, should not be used in statistical analyses. They have no predictive value and can cause runtime exceptions.

  3. Fields (select two or more): Select the fields for the association analysis. If a "target" field has been selected, it will automatically be included in this list. The non-target fields must be numeric.
  4. Measure of association: Select one of Pearson product-moment correlation, Spearman rank-order correlation, or Hoeffding's D statistic.

Output

R Output: Report output includes 3 tables that comprise a Pearson Correlation Analysis: Focused Analysis of Field Trans, Full Correlation Matrix, and Matrix of Corresponding p-values.

I Output: Interactive report includes a Correlation Matrix with Scatterplot that changes based on your mouse position.

Table of Critical Values for Pearson's r