How hierarchical clustering works?

Hierarchical cluster analysis or HCA is a method of cluster analysis which seeks to build a hierarchy of clusters. This can be done using two approaches:

Agglomerative: This is a “bottom up” approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.

Divisive: This is a “top down” approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

Algorithm Steps:

  1. Start by assigning each item to a cluster, so that if we have N data points, we will be having N clusters, each containing just one item.
  2. Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now we have one cluster less.
  3. Compute distances (similarities) between the new cluster and each of the old clusters.
  4. Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.hierarchical clustering

Advantages:

  1. No need to know no of clusters before starting the algorithm
  2. Gives the best result in most of the cases
  3. Many times used to know the “no of clusters” before implementing k-means algorithm

Disadvantages:

  1. Algorithm can never undo what was done previously.
  2. If the data points are too much, then it becomes difficult to identify the correct no of clustershierarchical clustering
  3. Times consuming algorithm.

R Code:

Find full code with explanation here for doing hierarchical clustering in R.

Share the joy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Roma

Leave a Reply

avatar
  Subscribe  
Notify of