3.13 Analysis of a Binary Table

Some times, the analyzed data is exclusively formed of a set of features reflecting presence or absence of a certain attribute in individuals. For example, a variable Sex with categories “female” and “male” can be mapped into this presencs/absence setting: “female” = presence, and “male” = absence. Having a set of variables like these, we can numerically encode the categories with 0 and 1, e.g. “female” = 1, “male” = 0. We call these features binary variables. Alternatively, we can also think of this type of variables as nominal variables containing two categories.

When we have a table of binary variables, we could perform a Multiple Correspondence Analysis (MCA). However, applying MCA on a binary table, we will see that each category has its complemtary category. It can be proved that performing PCA on the binary table, after having removed one of the category-columns for each binary variable, produces the same result of an MCA on the full binary table.

There will be a scaling difference between the PCA results and the MCA results, but we can multiply the binary variables by some coefficient for both results to be identical (Lebart et al, 1977).

Minimal Encoding

In the example of the table with individual consumptions, we could encode the observed information as follows: “1” if the expenses are positive, and “0” if expenses are null. Let’s suppose that the analysis on this binary table, provides a configuration of expenses in the 7 products similar to the one obtained in the analysis that uses the original variables. This would indicate that the exposed relationships only depend on the access to these consumptions and not on their volume.