The Principal Components tool can reduce the dimensions (the number of numeric fields) in a database. It does this by transforming the original set of fields into a smaller set that accounts for most of the variance (i.e., information) in the data. The new fields are called factors, or principal components.
The principal components are extracted sequentially, with the first principal component accounting for the most variance in the data. Intuitively the first principal component is a vector that points in the direction in which the data are most âspread out.â The second principal component is set up similarly, but with the additional constraint that it must be uncorrelated with the first. Each subsequent principal component captures an increasingly lower percentage of variation in the data, and is uncorrelated with the previously extracted principal components. There can be as many principal components as there are numeric fields in the data. However, it is typically possible to capture the variance in the data using the first few principal components instead of the full set of original numeric fields. A principal component is made up of a weighted linear combination of the original numerical fields. Together they can be used to form a new coordinate system, where each dimension is uncorrelated to the others.
Principal components can be used instead of the original fields in predictive models, avoiding the problems that can occur when highly correlated variables are used, but at the cost of making model interpretation more difficult. In addition, the method can be used to determine which groups of fields are likely to be jointly highly related to one another, and help guide decisions in which fields to omit from a predictive model. Finally, the ability to "collapse" a large number of fields into a small number of principal components is often a benefit in visualizing relationships in the data.
This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.
Fields (select two or more): Select the numeric fields to be used in the principal components analysis.
Scale each field to have unit variance?: Select this option to standardize the data and use the autocorrelation matrix instead of the autocovariance matrix as a basis for analysis.
The highest number of principal components to include in biplots: A biplot is a means of visualizing a principal components solution, two components at a time. This option set the upper limit of the principal components to use in the analysis. For example, if this parameter is set to "3", then biplots will include the first and second, first and third, and second and third principal components in three separate figures.
Append principal components to the data stream: when checked, will output the original data along with additional fields for the appended Principal Components. The added fields are labeled PC1, PC2, and so on.
Graph resolution: Select the resolution of the graph in dots per inch: 1x (96 dpi); 2x (192 dpi); or 3x (288 dpi). Lower resolution creates a smaller file and is best for viewing on a monitor. Higher resolution creates a larger file with better print quality.
O: Consists of the input data stream with the Principal Components appended.
R: Consists of the report snippets generated by the Principal Component tool: a statistical summary, basic plots and biplots.
*https://en.wikipedia.org/wiki/Principal_component_analysis
©2018 Alteryx, Inc., all rights reserved. Allocate®, Alteryx®, Guzzler®, and Solocast® are registered trademarks of Alteryx, Inc.