What is this?
Big data analytics refers to the process of collecting, organizing and analyzing large sets of data (“big data”) to discover patterns and other useful information. Not only will big data analytics help you to understand the information contained within the data, but it will also help identify the data that is most important to the business and future business decisions. We get data in crude form (un-structured, un-refined), from various sources like audio, video and text from social networking sites, cookies of our system etc. The key is to extract powerful insights from this pile of data. The first step is the refinement of this crude data and then draws inferences from this; this is what is known as analytics.
Steps involved in business analytics:
- The very first step is called as Descriptive Analytics: In this we take the raw data and chunk it up into pieces that are more practical and useful. The real value is in taking that data and turning it into actionable information. In simpler word, understanding data and answers to question like “what happened”.
- Next is Diagnostic analytics: Here we will try to know the reason “why it happened”, why this data has been generated. We try to diagnose the reasons.
- Third step is Predictive Analytics: It is the next step in data reduction. It uses variety of statistical, modeling, data mining, and machine learning techniques to study recent (sometimes even real-time) and historical data, thereby allowing analysts to make predictions about the future. It will tell what will happen in future; in fact no one can say that. Predictive analytics just forecast what may happen in future which are probabilistic in nature. Questions like “What is likely to happen in future” get answered here.
- The final phase of Prescriptive Analytics: We have got the future prediction, then “what should we do now?” It takes new data to re-prescribe and re-predict therefore prediction accuracy will increase. It suggests decision options for how to take advantage of a future opportunity or mitigate a future risk, and illustrate the implications of each decision option. It helps decision makers to take advantage of this analysis.
How big is “Big Data”?
We can describe this by using 6 ‘V’s model:
- Volume: It refers to vast amount of data getting generated every second (Zettabytes or Brontobytes of data). Now data is generated by machines, networks and human interaction on systems like social media.
- Velocity: it refers to speed at which data is getting generated and the speed at which data is moving around. Just thing of FB, you can imagine the activity on FB every second!
- Variety: it refers to the different types of data that is getting generated these days. Thanks to smart phones and social media! Previously, there were only structured data, but now we are getting both unstructured (from spreadsheets, tables and databases) as well as un-structured (in the form of audio, video, photos etc).
- Veracity: It refers to the biases, noise and abnormality in data. We need to remove this messiness from the data to increase the quality and accuracy. Ex for this would include typo and “hashtags” in twitter.
- Volatility: It refers to how long is data valid and how long should it be stored. In this world of real time data you need to determine at what point is data no longer relevant to the current analysis.
- Value: It is all well and good having access to big data but unless we can turn it into value it is useless. So you can safely argue that ‘value’ is the most important V of Big Data.