Code for Clustering in R using Iris dataset

> View(iris)

IRIS Dataset

To know the optimal no of clusters, using hierarchical clustering methodology:

> d=dist(scale(iris[,-5]))

> h=hclust(d,method=’ward.D’)

> plot(h,hang=-1)

> k=kmeans(iris[,-5],3)

> rect.hclust(h,h=35,border=”blue”)

> k

Following dendrogram appeared:

hierarchical clustering using iris dataset

Selecting 3 to be most optimal, applying k-means to get the centers for these 3 clusters:

> k=kmeans(iris[,-5],3,nstart=20)

By giving nstart=20, we are fixing the starting point so that each time we run this command we will get the same center value, otherwise algorithm will select some point randomly and center value will get change.

> k

K-means clustering with 3 clusters of sizes 62, 50, 38

Cluster means:

  Sepal.Length Sepal.Width Petal.Length Petal.Width

1     5.901613    2.748387     4.393548    1.433871

2     5.006000    3.428000     1.462000    0.246000

3     6.850000    3.073684     5.742105    2.071053

 Clustering vector:

  [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

 [36] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

 [71] 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 3 3

[106] 3 1 3 3 3 3 3 3 1 1 3 3 3 3 1 3 1 3 1 3 3 1 1 3 3 3 3 3 1 3 3 3 3 1 3

[141] 3 3 1 3 3 3 1 3 3 1

 Within cluster sum of squares by cluster:

[1] 39.82097 15.15100 23.87947

 (between_SS / total_SS =  88.4 %)

 Available components:

[1] “cluster”      “centers”      “totss”        “withinss”   

[5] “tot.withinss” “betweenss”    “size”         “iter”       

[9] “ifault”     

Let’s have a look on contingency table:

> table(iris$Species, k$cluster)

              1  2  3

  setosa      0 50  0

  versicolor 48  0  2

  virginica  14  0 36

> iris$cluster=k$cluster

> View(iris)

Code for Clustering in R using Iris dataset

Plotting the clusters:

Install “cluster” package for running this command:

> clusplot(iris, iris$cluster, color=TRUE, shade=TRUE, labels=2, lines=0)

Code for Clustering in R using Iris dataset - 2

 

Share the joy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Roma

Leave a Reply

avatar
  Subscribe  
Notify of