Hierarchical cluster analysis or HCA is a method of cluster analysis which seeks to build a hierarchy of clusters. This can be done using two approaches:
Agglomerative: This is a “bottom up” approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
Divisive: This is a “top down” approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.
- Start by assigning each item to a cluster, so that if we have N data points, we will be having N clusters, each containing just one item.
- Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now we have one cluster less.
- Compute distances (similarities) between the new cluster and each of the old clusters.
- Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.
- No need to know no of clusters before starting the algorithm
- Gives the best result in most of the cases
- Many times used to know the “no of clusters” before implementing k-means algorithm
- Algorithm can never undo what was done previously.
- If the data points are too much, then it becomes difficult to identify the correct no of clusters
- Times consuming algorithm.
Find full code with explanation here for doing hierarchical clustering in R.