# Boruta Algorithm

While working on Machine Learning/Predictive Modelling problems, feature selection is an important step. It is because, we get a dataset with too many variables in practical model building problems in which all variables are not relevant to the problem, and this we don’t know in advance. Also, there are some disadvantages of using all given[…]

# Demonetization in India : Public Reaction Analysis

In twitter, reviewers are mostly talking about black money, currency ban, Modi fights corruption etc.;In facebook, reviewers are mostly talking about PM Modi’s Master stroke, Bank, Money, Currency etc.

# Descriptive Vs Inferential Statistics

Descriptive Statistics is the term given to the analysis of the data, which will show meaningful insights, patterns present in data. However this doesn’t allow us to make any conclusions beyond the given data points. Let us take an example, Suppose in a company if Higher Management asked for Revenue data. Then directly giving him[…]

# Topic Modelling

What it is? I came across this technique while working with Text. I was trying to analyse Twitter’s tweets and Facebook’s posts from page after Reliance Jio Launch. Analysis invloves: Data Collection Data Cleaning Word Cloud creation Sentiment Analysis After this I was thinking to do something else, while searching on net I found this new[…]

# Connecting R with SQL Server

This post states the steps to connect R studio with SQL Server, so that we can directly access tables and can do analysis on data stored in SQL Server. System Related Settings 1. Go to Control panel of your system. 2. Click on Administrative tools 3. Select User dsn -> click on “add” -> “Sql[…]

# Use of Slicer in MS Excel

If we are analyzing dataset using pivot tables like this: Let suppose there are many values in column selected as filter (here we want to know detail month wise – it has 12 values), so each time selecting value from drop down is little difficult. Slicer can be used to simplify this scenario, as it[…]

# Missing Value imputation – advanced way

Missing value or junk value imputation with mean/median/mode is the very basic part of data cleaning [ Read This ], as these processes will give the accuracy up to a certain level. Also if mean/median/mode are applicable when our data is in some traditional format, but in most of the practical scenario it is not.[…]

# Missing Value imputation – basic way

Data cleaning is the most important part of data analysis and if we have missing values in our dataset, our task is going to be more tedious.  If number of observations with missing values is <=5% we can simply delete those observations, but what if the number of observations are a lot? then we can’t[…]