Text Analytics Part VI – Classification using SVM using R

Hope you have gone through all my previous posts on Text Analytics, if not please go through because this is in continuation with that starting from here.
Classification is a data mining technique used to predict group membership for data instances. Following are the examples of cases where the data analysis task is Classification:

  • A bank loan officer wants to analyze the data in order to know which customer (loan applicant) is risky or which are safe.
  • A marketing manager at a company needs to analyze to guess a customer with a given profile will buy a new computer.

In both of the above examples a model or classifier is constructed to predict categorical labels. These labels are risky or safe for loan application data and yes or no for marketing data.

Similarly, here I will try to classify terms on the basis of satisfaction which will have two labels, two categories:

  1. Satisfied (rating given = 3, 4 and 5)
  2. Dissatisfied (rating given = 1 and 2)

What is Support Vector Machine?

A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples.

I have used SVM to do classification and to found the top most words which have highest weight for both negative and positive meaning.


R Code to do this:

#Classification condidering two levels for satisfaction "Satisfied" >=3


This is a chunk of code for creating contingency table of ratings with 3 categories of satisfaction, find full codes in R here.

In my next post, I will do sentiment analysis and polarity check, till then,
have a nice day!
Keep learning!

Leave a Reply

Your email address will not be published. Required fields are marked *