K-means is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem.

### Algorithm Steps:

Step 1: First decide the no of clusters (let suppose k clusters we want to create)

Step 2: Randomly assign centres to these k clusters

Step 3: Calculate the distance of remaining data points with these k clusters and assign the points to that cluster which has shortest distance from that point

Step 4: After assigning all data points, calculate the mean within each cluster and repeat step 3 w.r.t to this new centre point and re-assign on the basis of shortest distance

Step 5: Repeat until convergence.

### Advantages:

- Fast, Robust and easier to understand
- Gives best result when data set are distinct or well separated from each other

### Disadvantages:

- This algorithm requires specification of “no of clusters” in the beginning of algorithm, which is tough to determine
- Applicable only when mean is defined i.e. fails for categorical data
- Algorithm fails for non-linear data set
- Unable to handle outliers

### R Code:

Find full code with explanation here for doing k-means clustering in R.

## Leave a Reply