K-means is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem.
Step 1: First decide the no of clusters (let suppose k clusters we want to create)
Step 2: Randomly assign centres to these k clusters
Step 3: Calculate the distance of remaining data points with these k clusters and assign the points to that cluster which has shortest distance from that point
Step 4: After assigning all data points, calculate the mean within each cluster and repeat step 3 w.r.t to this new centre point and re-assign on the basis of shortest distance
Step 5: Repeat until convergence.
- Fast, Robust and easier to understand
- Gives best result when data set are distinct or well separated from each other
- This algorithm requires specification of “no of clusters” in the beginning of algorithm, which is tough to determine
- Applicable only when mean is defined i.e. fails for categorical data
- Algorithm fails for non-linear data set
- Unable to handle outliers
Find full code with explanation here for doing k-means clustering in R.