How k-means clustering works?

K-means is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem.

Algorithm Steps:

Step 1: First decide the no of clusters (let suppose k clusters we want to create)

Step 2: Randomly assign centres to these k clusters

Step 3: Calculate the distance of remaining data points with these k clusters and assign the points to that cluster which has shortest distance from that point

Step 4: After assigning all data points, calculate the mean within each cluster and repeat step 3 w.r.t to this new centre point and re-assign on the basis of shortest distance

Step 5: Repeat until convergence.

Advantages:

  1. Fast, Robust and easier to understand
  2. Gives best result when data set are distinct or well separated from each other

Disadvantages:

  1. This algorithm requires specification of “no of clusters” in the beginning of algorithm, which is tough to determine
  2. Applicable only when mean is defined i.e. fails for categorical data
  3. Algorithm fails for non-linear data set
  4. Unable to handle outliers

K-means algorithm

 

R Code:

Find full code with explanation here for doing k-means clustering in R.

Share the joy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

Leave a Reply

Your email address will not be published. Required fields are marked *