Analysis of Tweets on Twitter

We will do Popularity Comparison(between FIFA World Cup 2014 and IPL)using R


Steps to be followed:

Step 1 – Create Twitter App

This is an important step as this provides a way of proving to Twitter that “who you are, when you search for (or post) tweets” from a software application. The folks at Twitter adopted an industry standard for this process known as OAuth. OAuth provides a method for obtaining two pieces of information – a “secret” and a “key”.
Here are the steps that need to be followed:
  1. Sign-in to
  2. Go to the development page at Twitter ( with the same credentials.
  3. Click on My Application


    Create New Application and fill the required form to create your new app.

  4. You can provide your homepage link or any website that you have in “website” column and Callback URL can be left as blank. Tick the checkbox specified in the image below under settings. Your application should be set so that it can be used to sign in with Twitter.

  6. Through “TEST OAuth” button, you will always get the consumer key and consumer secret. These strings will be used later to get your application running in R. The reason these are such long strings of gibberish is that they are encrypted.

Step 2 – Making R ready to use it with Twitter

  1. Create a “New Project” in whatever directory u want. R-studio will look something like this:

  3. In order to use twitteR, we need to load several packages that it depends upon. Go To Tools


    Install Packages options and install one by one the following : “bitops”, “RCurl”, “RJSONIO”, “twitter”, “ROAuth”

  4. After installation, we will get the message for successful installation:
  5. Also, to cross check the installation, we can create a function that will take package name as an argument, which ensures the installation of package:
  6. We’ll make a new function, “PrepareTwitter,” that will load up all of our packages for us.




  8. We have created the function, now we will run the

  9. For Windows user’s, it will be good, if you download new SSL certificates before R tries to contact Twitter for authentication. As certificates help to maintain secure communications across the Internet and most computers keep an up-to-date copy on file. Run the following line of code:
  10. Before starting fetching data from Twitter, notice the package panel on below right corner that every packages that is installed are ticked-mark or not:

Step 3 – Using Your OAuth Tokens (Consumer key and secret) in R

  1. Before starting retrieving data from Twitter, we need to connect to it by using key and secret code. Use following command:
    Credential <- OAuthFactory$new


  2. Now, what the variable “credential” contains, we can see as:
  3. Do, credential handshake on Windows machines, by typing following command. You will get the response as below:
  4. To enable the connection, you need to enter the PIN by going to the site (the link is given in the response):

Step 4 – We are ready to do popularity comparison b/w two major sports league by tweets from Twitter

  1. Test your connectivity:
  2. Create a function, to get data frames based on search of Twitter. We will use this function to get the data from twitter:
    TweetFrame <- function(searchTerm, maxTweets)

    { twtLst <- searchTwitter(searchTerm,n=maxTweets,cainfo=”cacert.pem”)

  3. We will be sorting this data from latest to oldest and assign this back to a new data frame:

  4. Now that the tweets are arranged in order of their time-stamps, let’s write another line of code to get the difference between their times of creation;
  5. difftime_f and difftime_i contains the inter-arrival time between two tweets about them:

  6. We could calculate a mean on the time differences, the average inter arrival time:
  7. This means that the mean arrival delay for the next tweet on FIFA is just short by 0.14 sec and for IPL is 76 sec. By this data only, we can conclude that who has won the race of popularity. But we will go one step forward to prove this with the help of Poisson distribution.

  8. Another way of looking into the statistics is by below commands.
  9. This clearly shows that, who the winner is. For FIFA 169 out of 500 tweets arriving in an interval of 0.14 seconds and for IPL it is only 4 out of 500 in the same interval.

    We will do little bit calculation here:
       For FIFA, 169/500 = 0.338 (33.8 %)
      For IPL, 4/500 = 0.008(0.8 %)

  10. It’s time to find 95% confidence interval for both:

  11. This means that, if we take sample of 500 tweets for FIFA then for 95% of the sample, the fraction of inter-arrival time will be less than 0.14 seconds varies between 0.289 and 0.392 and for IPL, it will vary between 0.002 and 0.020

  12. Install the gplots package so that we could use the barplot2 function:
  13. Type the below command to draw the plot:
  14. c() = holds the value of fraction of tweets in within inter-arrival time
    ci.l() = holds the lower limit of confidence interval
    ci.u = holds the upper limit of confidence interval
    ci = TRUE ask barplot2() function to put confidence interval whiskers on each bar
    names.arg() = is to label name of bars.


Share the joy

Leave a Reply

Notify of