7.5 Clustering of Factors

We can use clustering techniques to find a reduced number of groups of individuals, homogenous, and well separated.

Advantages

  • Using a clustering technique allows us to handle a considerable number of dimensions from a dimension reduction procedure (e.g. PCA).

  • using a clustering technique also allows us to reduce the amount of “noise” in the data (by using only the most important dimensions).

  • it also enables us to cluster the individuals for the active variables.

Clustering approaches

Different clustering techniques can be used.

  • Hierarchical clustering methods:
    • Ward’s criterion
    • Single linkage
    • Maximum linkage
  • Moving centroids:
    • K-means

The final partition can depend on the centers of the initial groups. To overcome this limitation, we can use the strong or stable clusters.

  • Hybrid clustering:
    • We can combine a hierarchical approach with a moving centroid approach.
    • This hybrid strategy is recommended for large data sets.

Groups consolidations

We can use a K-means algorithm by taking the initial centroids to be the centroids of the clusters obtained by cutting or pruning the dendrogram. This improves the quality and stability of the obtained partition.

Description of the clusters

To better understand and describe the obtained clusters, we can use both the active and supplementary variables.

Using the v-tests:

  • Categorical variable: a category characterizes a group if such category is significantly more abundant—or also relatively uncommon—in a given cluster compared to the overall population (comparison based on percentages).

  • Continuous variable: a continuous variable characterizes a group if its mean is significantly higher or smaller than the rest of the population (comparison based on the means).