Pearson Correlation Tool

The Pearson Correlation tool uses the Pearson product-moment correlation coefficient (sometimes referred to as the PMCC, and typically denoted by r) to measure the correlation (linear dependence) between two variables X and Y, giving a value between +1 and −1 inclusive. It is widely used in the sciences as a measure of the strength of linear dependence between two variables.*

Correlation (often measured as a correlation coefficient, ρ) indicates the strength and direction of a linear relationship between two random variables. Correlation values ranges from –1.00 (a perfect negative correlation) to +1.00 (a perfect positive correlation). Zero indicates no correlation at all.

The Pearson coefficient is obtained by dividing the covariance of the two variables by the product of their standard deviations.*

This tool replaces the Pearson Correlation Coefficient tool that has been deprecated.

Configuration Properties

  1. Generate correlation for selected variables: Select two or more fields from the input stream to run the correlation on. Fields must be numeric.

    Columns containing unique identifiers, such as surrogate primary keys and natural primary keys, should not be used in statistical analyses. They have no predictive value and can cause runtime exceptions.

  2. Specify the type of calculation to run. Choices are:

The Pearson Correlation Coefficient tool expects non-Null values. If there are nulls in the data, it is a good idea to use the Impute Values tool to replace the nulls first.

*http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient