6.3 Biplot and PCA
The so-called biplot is a general method for simultaneously representing the rows and columns of a data table. This graphing method consists of approximating the data table by a matrix product of dimension 2. The goal is to obtain a plane of the rows and columns. The techniques behind a biplot involves an eigendecomposition, such as the one performed in PCA. Usually, the biplot is carried out with mean-centered and scaled data.
Recall that PCA provides three types of graphics to visualize the active elements:
The “circle of correlations” where we represent the continuous variables (the cosine of the angle between two variables is the same as the correlation between variables).
The configuration of the individuals in the factorial plane; the utilized distance is the classic euclidean distance.
The simultaneous representation—in the orthonormed basis—of the original variables in the center of gravity of the cloud of points of individuals.
We should keep in mind that the aim of a biplot is to get a projection of the individuals on the directions of the original variables that respects as much as possible the distribution of the initial data.
In a biplot, we overlap in the same graphic both the rows and the columns, according to three types of simultaneous representations:
In the space of variables: the cosine of the angle between two variables approximates the correlation between these two variables; likewise, the distance between two individuals approximates the Mahalanbis distance (not the typical euclidean distance in PCA).
In the space of individuals: the distance between individuals approximates the euclidean distance, but the distance between variables is not directly interpretable.
In an intermediate space: the distances, between individuals and variables, are not directly interpretable, but we obtain a “balanced” plot.
Every matrix \(\mathbf{Y}\) can be decomposed into the following product:
\[ \mathbf{Y} = \mathbf{AB^\mathsf{T}} \]
with dimensions: \((n,p) = (n,k) \times (k,p)\), where \(k\) is the rank of \(\mathbf{Y}\).
In a biplot, like in PCA, we graphically represent the individuals as points, and the variables as vectors (i.e. arrows). The biplot involves approximating \(\mathbf{Y}\) by the product:
\[ \mathbf{\hat{Y}} \approx \mathbf{AB^\mathsf{T}} \]
with dimensions: \((n,p) = (n,2) \times (2,p)\). The rows of the matrix \(\mathbf{A}\) represent the individuals, and the rows of \(\mathbf{B}\) represent the variables. In order to achieve this decomposition, we use the same decomposition in a PCA, that is, the eigendecomposition of \(\mathbf{Y}\):
\[ \mathbf{Y} = \mathbf{V \Lambda U^\mathsf{T}} \]
where \(\mathbf{U}\) contains the eigenvectors of \(\mathbf{Y^\mathsf{T} Y}\), and \(\mathbf{\Lambda}\) is the diagonal matrix of singular values (i.e. square root of the eigenvalues of \(\mathbf{Y^\mathsf{T}Y}\)). We have that:
\[ \mathbf{V} = \mathbf{YU\Lambda}^{-1} \]
Retaining only the first two eigenvalues, we obtain the rank-2 approximation of \(\mathbf{Y}\) by:
\[ \mathbf{Y} \approx \hat{\mathbf{Y}} = \underset{(n,2)}{\mathbf{V}} \mathbf{\Lambda} \underset{(2,p)}{\mathbf{U}} \]
We can define three decompositions of \(\mathbf{Y}\) in therms of \(\mathbf{AB^\mathsf{T}}\) based on the form in which we assign the singular values between individuals (\(\mathbf{V}\)) or between variables (\(\mathbf{U}\)).
Representation | \(\mathbf{A}\) | \(\mathbf{B^\mathsf{T}}\) |
---|---|---|
Space of variables | \(\mathbf{V}\) | \(\mathbf{\Lambda U^\mathsf{T}}\) |
Balanced | \(\mathbf{V\Lambda}^{1/2}\) | \(\mathbf{\Lambda}^{1/2} \mathbf{U^\mathsf{T}}\) |
Space of individuals | \(\mathbf{V\Lambda}\) | \(\mathbf{U^\mathsf{T}}\) |
Consider the expression \(y_{ij} \approx \mathbf{a_i}^{\mathsf{T}} \mathbf{b_j}\)
This scalar product shows that the projections of the points \(\mathbf{a_i}\) on the directions defined by \(\mathbf{b_j}\) apprixmates the distribution of the initial data of variable \(\mathbf{y_j}\), regardless of the performed decomposition.
Simultaneous Representation in the Variables Space
\[ \mathbf{Y} \approx \mathbf{AB^\mathsf{T}} = (\mathbf{V})(\mathbf{\Lambda U^\mathsf{T}}) \]
The cosine of the angle formed by the vectors \(\mathbf{b_j}\) and \(\mathbf{b_l}\) corresponds to the correlation between variables \(\mathbf{y_j}\) and \(\mathbf{y_l}\). Like in PCA, this property also holds for the active variables. With respect to the supplementary variables, this property is hold only through the axes.
The euclidean distance between the individuals \(\mathbf{a_i}\) and \(\mathbf{a_h}\) is proportional to the Mahalanobis distance between the individuals \(\mathbf{y_i}\) and \(\mathbf{y_h}\) of the partitioned table.
The Mahalanobis distance is a distance that takes into account the correlations between the variables. This distance transforms the cloud of row points, usually in an elliptical shape, into a circular shape. The Mahalnobis distance is given by:
\[ \delta^2 (i, h) = (\mathbf{y_i} - \mathbf{y_h})^\mathsf{T} \mathbf{W}^{-1} (\mathbf{y_i} - \mathbf{y_h}) \]
where \(\mathbf{W}\) is the covariance-variance matrix.
Simultaneous Representation in the Individuals Space
\[ \mathbf{Y} \approx \mathbf{AB^\mathsf{T}} = (\mathbf{V \Lambda})(\mathbf{U}) \]
The euclidean distance between two individuals \(\mathbf{a_i}\) and \(\mathbf{a_h}\) approximates the euclidean distance between the individuals \(\mathbf{y_i}\) and \(\mathbf{y_h}\) of the partitioned data table. In this case there are no special properties relative to the proximity between variables: the distances are not directly interpretable.
Balanced Simultaneous Representation
\[ \mathbf{Y} \approx \mathbf{AB^\mathsf{T}} = (\mathbf{V \Lambda}^{1/2})(\mathbf{\Lambda}^{1/2} \mathbf{U^\mathsf{T}}) \]
This option tends to balance the representation between the rows and the columns in the sense that, for each axis, the sum of the squared of the distances to the axis is the same for the cloud of individuals as for the cloud of variables.
We obtain a “balanced” graphic. Except by the common property of all the decompositions (i.e. the projection of individuals onto the variables approximates the data table), there are no specific properties for the interpretation of the proximities between individuals, and neither for the proximities between variables.