Text Analytics Part VI – Classification using SVM using R

Hope you have gone through all my previous posts on Text Analytics, if not please go through because this is in continuation with that starting from here.
Classification is a data mining technique used to predict group membership for data instances. Following are the examples of cases where the data analysis task is Classification:

  • A bank loan officer wants to analyze the data in order to know which customer (loan applicant) is risky or which are safe.
  • A marketing manager at a company needs to analyze to guess a customer with a given profile will buy a new computer.

In both of the above examples a model or classifier is constructed to predict categorical labels. These labels are risky or safe for loan application data and yes or no for marketing data.

Similarly, here I will try to classify terms on the basis of satisfaction which will have two labels, two categories:

  1. Satisfied (rating given = 3, 4 and 5)
  2. Dissatisfied (rating given = 1 and 2)

What is Support Vector Machine?

A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples.

I have used SVM to do classification and to found the top most words which have highest weight for both negative and positive meaning.

SVM

R Code to do this:

#Classification condidering two levels for satisfaction "Satisfied" >=3
View(data)
data3=data[1:2282]
satis=ifelse(finalratings1>2,"satisfied","dissatisfied")
data3=cbind(data3,satis)
data3=na.omit(data3)
data3=data3[,colSums(data3[,-length(data3)])>0]
svm=svm(satis~.,data=data3)

coef_imp=as.data.frame(t(svm$coefs)%*%svm$SV)
coef_imp1=data.frame(words=names(coef_imp),Importance=t(coef_imp))
coef_imp1=coef_imp1[order(coef_imp1$Importance),]
head(coef_imp1)
tail(coef_imp1)

This is a chunk of code for creating contingency table of ratings with 3 categories of satisfaction, find full codes in R here.

In my next post, I will do sentiment analysis and polarity check, till then,
have a nice day!
BBye!
Keep learning!

Share the joy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Roma

Leave a Reply

avatar
  Subscribe  
Notify of