Contingency Table Tool
The Contingency Table tool has a similar use to that of the Frequency Table Tool, but instead of just looking at each field individually, the Contingency Table tool looks at up to four variables/fields and how they relate to each other. The tool produces two outputs, a data output which lists all of the combinations of values between the fields selected, with a frequency and a percent column. The report output produces tables to show the combinations of values between the fields and also includes some additional row and column percentages.
If the user is just analyzing two fields, they can also select to output the chi-square statistic to be included with the report. A chi square statistic is used to investigate whether distributions of categorical variables differ from one another. R must be installed for this option to run successfully.
This tool uses the R tool. Go to Options > Download Predictive Tools and sign in to the Alteryx Downloads and Licenses portal to install R and the packages used by the R Tool.
Configure the tool
- Include chi-squared statistic: A chi square (X2) statistic is used to investigate whether distributions of categorical variables differ from one another. This data will be included in the report output.
- Do not include chi-squared statistic: At least two fields and up to four fields may be selected.
Select the two fields to analyze.
When selecting fields for either option, the following rules apply:
- Each variable must have unique values. If the values are not unique across the fields, an error will be thrown.
- Certain field types cannot be selected: FixedDecimal, Float, Double, Date, Time, DateTime, Blob and SpatialObj. Integer field types are allowed but should only be used if the field is truly categorical.
View the output
D anchor: Data output includes the following fields:
Name | Description |
---|---|
InputField_SelectedField1 (2, 3, 4) | Original field name of the input data.
Depending how many fields are selected InputField_SelectedField3 and InputField_SelectedField4 may not be present and the part in italics will be updated with the actual selected field name. |
Frequency | Count of times the value is present in the input data for the given Field Name. |
Percent | (Frequency/Total Records) *100 |
R anchor: Report Output includes a Contingency table for each field selected.
The first record in this output will show any warnings for field types, if any of the selected fields are set to numeric data types than a warning is shown. The rest of the report shows a contingency table for each combination of field values, the header for the table shows the fields that were selected by the user and the values for any fields which are not shown in the table. The table also shows a Total column and rows for Frequency, Percent, Row Percent and Column Percent.
If the chi-square statistic option is selected then underneath the table the following values are displayed; Chi-squared, df, and p-value. Chi-squared is the calculated chi-square value, df is degrees of freedom and p-value is the returned statistic value from R, the lower the p-value the more likely it is that the variables are dependent to each other.
I anchor: Interactive Output includes a chart where the viewer can customize what displays with a series of drop down options.