Hope you have gone through all my previous posts on Text Analytics, if not please go through because this is in continuation with that starting from here.
Classification is a data mining technique used to predict group membership for data instances. Following are the examples of cases where the data analysis task is Classification:
- A bank loan officer wants to analyze the data in order to know which customer (loan applicant) is risky or which are safe.
- A marketing manager at a company needs to analyze to guess a customer with a given profile will buy a new computer.
In both of the above examples a model or classifier is constructed to predict categorical labels. These labels are risky or safe for loan application data and yes or no for marketing data.
Similarly, here I will try to classify terms on the basis of satisfaction which will have two labels, two categories:
- Satisfied (rating given = 3, 4 and 5)
- Dissatisfied (rating given = 1 and 2)
What is Support Vector Machine?
A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples.
I have used SVM to do classification and to found the top most words which have highest weight for both negative and positive meaning.
R Code to do this:
#Classification condidering two levels for satisfaction "Satisfied" >=3 View(data) data3=data[1:2282] satis=ifelse(finalratings1>2,"satisfied","dissatisfied") data3=cbind(data3,satis) data3=na.omit(data3) data3=data3[,colSums(data3[,-length(data3)])>0] svm=svm(satis~.,data=data3) coef_imp=as.data.frame(t(svm$coefs)%*%svm$SV) coef_imp1=data.frame(words=names(coef_imp),Importance=t(coef_imp)) coef_imp1=coef_imp1[order(coef_imp1$Importance),] head(coef_imp1) tail(coef_imp1)
This is a chunk of code for creating contingency table of ratings with 3 categories of satisfaction, find full codes in R here.
In my next post, I will do sentiment analysis and polarity check, till then,
have a nice day!