Terminology

This section will briefly introduce you to the terms that we will use throughout this book.

  • Data table: rectangular array formed by rows and columns. In each of the table entries (i.e. intersection of one row and one column), we find a datum, typically codified in numeric form. We also use the term data matrix as a synonym term. We represent the data table with upper case bold letters e.g. \(\mathbf{X}\), with \(n\) rows and \(p\) columns.

  • Individual: an individual is a row of the data table. The word individual refers to any type of object: person, animal, country, company, planet, etc.

  • Variable: we use the word “variable” to refer to any column in the data table. For each individual, we observe the same aggregate, the same measure, the same question, etc. Instead of variable we can also talk about “measure”, attribute, characteristic, property, etc.

  • Continuous Variable: we talk about continuous variable when the measurement is quantitative. To be more precise, a variable is continuous when calculating the mean (or average) makes sense.

  • Nominal Variable: we talk about a nominal variable when its values are names of categories (or qualities). Examples of nominal variables are civil status (e.g. single, married, divorced, widowed), geographc region (e.g. north, south, west, etc). We also use the terms categorical or qualitative variable as synonyms of nominal.

  • Modality: the modalities or categories are the values taken by a nominal variable. For example the variable gender (typical) has two modalities: male and female. A modalitiy is also known as group, class, or category.

  • Cloud of Points: on a plane or in a three-dimensional space, the notion of “cloud of points” has to do with how the points are positioned according to a series of coordinates, which are based on a set of orthogonal axes. Given the coordinates of the points, it is easy to calculate distance between them. When we have more than three axes, even though the cloud of points exist, we cannot visualize them.

  • Distance: the distance between two points in a given cloud coincides with the usual notion of distance (which can be calculated based on the coordinates of the points via the famous Pythogorean theorem).

  • Inertia: inertia is a borrowed term from Mechanics (in Physics) which is entirely equivalent to the statistical concept of variance. The inertia gives an idea of the spread in a cloud of weighted points. If the individuals have the same weight (same importance), the direction with the larger inertia of the cloud is the direction of its major axis.

  • Center of Gravity: this is the average point (or central point) in a cloud of weighted points. Therefore, there is an equivalence between the mechanical notion of center of gravity and the statistical notion of average point.

  • Factorial Analysis: optimal visualization (in a certain sense) of a cloud of points in some multidimensional space.

  • Factorial Axes: these are the axes, sorted by importance, which we use to visualize the cloud of points in a factorial analysis. They are defined by the directions of larger stretching (inertia) of the cloud of poitns.

  • PCA: Principal Component(s) Analysis.

  • Active Variables: set of variables that are used in the computation of the axes for the factorial planes.

  • Supplementary Variables: set of variables that are NOT used in the computation of the factorial axes, but that can be used as aid in the interpretation of results.

  • Contribution: measures the participation of an element (e.g. category, variable, frequency, or individual) in the construction of a factorial axis.

  • Squared Cosine: measures the quality of the projection of an element (e.g. category, variable, frequency, or individual) along a factorial axis.

  • V-test: measures, in number of standard deviations, the distance between an observed value and its theoretical value under a null hypothesis. This test is used to characterize the axes, the categories, the classes, etc.