Big data is the newest buzz word in the industry. Executives and information technology experts are all dropped off from cloud computing buzz and hopped into the big data band wagon. Generally, the excitement and buzz in market leads into a misconception of a new idea and takes few iterations before the key concept of new idea is widely understood.
Is Big Data a new concept? – No. The concept has been there for four decades and it has been named as enterprise data warehouse (EDW) and the focus of EDW is primarily on the internal structured data.
The objective of this blog is to bring the key concept of big data by comparing it with enterprise data warehouse.
The simpliest view of a data warehouse is to take all the operational data to one place as single point of truth for the organization and all the combination of analytical reports are generated out of it. A typical enterprise data warehouse data flow is given in the figure above. If EDW is already in existence, what is big data and why this big data, big data di? (I mean: now?)
What is it? – To go back to my last article on Money ball architect, big data is a collection of internal and external information that required for Money Ball architects. Based on my definition, a Money Ball architect (otherwise called data architect or data scientist) shall work to identify a set of differentiating data from a massive data set. Differentiating data will be modeled and derived when the product, service, consumer & partner trends are studied and understood. The consumer, partner, product and economical data is unstructured in uncharted territory. A massive data set in uncharted territory includes both internal, external structured and unstructured data. The massive data set is called big data.
Why is it now? – A need arose for big data with emergence of social media and other unstructured data widely used both internally and externally in an organization. The unstructured data includes the customer status update in facebook, twitter, youtube video upload, picture upload from a smart phone and voice assistance like Siri. The behavior of consumer, end user actual experience, product acceptance & adoption are viral, unstructured and paradoxical. With rapid adoption and growth in mobile technology- the consumer interaction, purchasing habits, product reviews are done viral. Simplified approach for the consumer to engage in an experience increased the complexity of analysis from a service provider perspective.
An unsatisfied customer does not call “1-800-sup-port” number any more to file a compliant. They tweet, or update in their facebook status about their experience. The companies trying to measure the customer satisfication by analysing the internal customer compliant database sure will miss the reality. Traditional and trivial data analytics are not good enough anymore. Availability of technologies like Hadoop, HDFS, Avro, MapReduce, Zoo Keeper, Pig, Chukwa, Hive, HBase,R Programming make the big data concept practical. Emergence of massive unstructured data through social media , utilization of it for daily activities and availability of technologies led into the bigdata now.
All of the core technologies for Bigdata are open source tools. With minimum hiccups during the Easter weekend, Hadoop, MapReduce was successfully installed, configured and functional in Ubuntu Linux runing on Virtual Box on the host OS Windows 7.
There are lots of commercialized version and open source tool available to run an enterprise big data infrastructure. I will write a big data technology landscape as my next topic related to big data.