What is PCA?

Purpose - The purpose of principal component analysis (PCA) is to reduce the dimensionality of data while preserving as many information as possible, making it easier to visualize and/or analyze. Principal Components are directions, or vectors, that data can be projected onto. The top principal components preserve the most amount of information. This means that data points projected onto the top principal component will have the highest variance. PCA differs from concepts like linear regression because the goal of linear regression lines show a relationship between inputs and outputs. In contrast, dimensions used in PCA are all inputs, or features, and the relationship to the outputs is not considered when reducing dimensions. PCA will remove redundancies in the input data to make it easier to analyze.

Real World Examples

Path of a ball - a ball is moving thrown forward and does not move left or right. One of its coordinates does remains constant. This redundant information does not need to be used for analysis since it does not change. Demographics of a City - A survey collects data about a city's residents. The survey asks for the date of birth and the age of the resident. These values have a direct correlation, and one of them is redundant and can be removed. Colors - Images can be converted to gray scale by assigning brightness values based on the R, G, B principal components. The gray scale image will preserve features while removing color.