Code for Clustering in R using Iris dataset

> View(iris)

To know the optimal no of clusters, using hierarchical clustering methodology:

> d=dist(scale(iris[,-5]))

> h=hclust(d,method=’ward.D’)

> plot(h,hang=-1)

> k=kmeans(iris[,-5],3)

> rect.hclust(h,h=35,border=”blue”)

> k

Following dendrogram appeared:

Selecting 3 to be most optimal, applying k-means to get the centers for these 3 clusters:

> k=kmeans(iris[,-5],3,nstart=20)

By giving nstart=20, we are fixing the starting point so that each time we run this command we will get the same center value, otherwise algorithm will select some point randomly and center value will get change.

> k

K-means clustering with 3 clusters of sizes 62, 50, 38

Cluster means:

Sepal.Length Sepal.Width Petal.Length Petal.Width

1     5.901613    2.748387     4.393548    1.433871

2     5.006000    3.428000     1.462000    0.246000

3     6.850000    3.073684     5.742105    2.071053

Clustering vector:

[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

[36] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

[71] 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 3 3

[106] 3 1 3 3 3 3 3 3 1 1 3 3 3 3 1 3 1 3 1 3 3 1 1 3 3 3 3 3 1 3 3 3 3 1 3

[141] 3 3 1 3 3 3 1 3 3 1

Within cluster sum of squares by cluster:

[1] 39.82097 15.15100 23.87947

(between_SS / total_SS =  88.4 %)

Available components:

[1] “cluster”      “centers”      “totss”        “withinss”

[5] “tot.withinss” “betweenss”    “size”         “iter”

[9] “ifault”

Let’s have a look on contingency table:

> table(iris\$Species, k\$cluster)

1  2  3

setosa      0 50  0

versicolor 48  0  2

virginica  14  0 36

> iris\$cluster=k\$cluster

> View(iris)

Plotting the clusters:

Install “cluster” package for running this command:

> clusplot(iris, iris\$cluster, color=TRUE, shade=TRUE, labels=2, lines=0)