Principal Components Tool

The Principal Components tool can reduce the dimensions (the number of numeric fields) in a database. It does this by transforming the original set of fields into a smaller set that accounts for most of the variance (i.e., information) in the data. The new fields are called factors, or principal components.

The principal components are extracted sequentially, with the first principal component accounting for the most variance in the data. Intuitively the first principal component is a vector that points in the direction in which the data are most “spread out.” The second principal component is set up similarly, but with the additional constraint that it must be uncorrelated with the first. Each subsequent principal component captures an increasingly lower percentage of variation in the data, and is uncorrelated with the previously extracted principal components. There can be as many principal components as there are numeric fields in the data. However, it is typically possible to capture the variance in the data using the first few principal components instead of the full set of original numeric fields. A principal component is made up of a weighted linear combination of the original numerical fields. Together they can be used to form a new coordinate system, where each dimension is uncorrelated to the others.

Principal components can be used instead of the original fields in predictive models, avoiding the problems that can occur when highly correlated variables are used, but at the cost of making model interpretation more difficult. In addition, the method can be used to determine which groups of fields are likely to be jointly highly related to one another, and help guide decisions in which fields to omit from a predictive model. Finally, the ability to "collapse" a large number of fields into a small number of principal components is often a benefit in visualizing relationships in the data.

This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.


Configuration Properties

Graphics Options