Traditional EDW vs Big Data

Big data is the newest buzz word in the industry. Executives and information technology experts are all dropped off from cloud computing buzz and hopped into the big data band wagon. Generally, the excitement and buzz in market leads into a misconception of a new idea and takes few iterations before the key concept of new idea is widely understood.

Is Big Data a new concept? – No. The concept has been there for four decades and it has been named as enterprise data warehouse (EDW) and the focus of EDW is primarily on the internal structured data.

The objective of this blog is to bring the key concept of big data by comparing it with enterprise data warehouse.

The simpliest view of a data warehouse is to take all the operational data to one place as single point of truth for the organization and all the combination of analytical reports are generated out of it. A typical enterprise data warehouse data flow is given in the figure above. If EDW is already in existence, what is big data and why this big data, big data di? (I mean: now?)

What is it? – To go back to my last article on Money ball architect, big data is a collection of internal and external information that required for Money Ball architects. Based on my definition, a Money Ball architect (otherwise called data architect or data scientist) shall work to identify a set of differentiating data from a massive data set. Differentiating data will be modeled and derived when the product, service, consumer & partner trends are studied and understood. The consumer, partner, product and economical data is unstructured in uncharted territory. A massive data set in uncharted territory includes both internal, external structured and unstructured data. The massive data set is called big data.

Why is it now? –  A need arose for big data with emergence of social media and other unstructured data widely used both internally and externally in an organization. The unstructured data includes the customer status update in facebook, twitter, youtube video upload, picture upload from a smart phone and voice assistance like Siri. The behavior of consumer, end user actual experience, product acceptance & adoption are viral, unstructured and paradoxical.  With rapid adoption and growth in mobile technology- the consumer interaction, purchasing habits, product reviews are done viral. Simplified approach for the consumer to engage in an experience increased the complexity of analysis from a service provider perspective.

“The behavior of consumer, end user actual experience, product acceptance & adoption are viral, unstructured and paradoxical”

An unsatisfied customer does not call “1-800-sup-port” number any more to file a compliant. They tweet, or update in their facebook status about their experience. The companies trying to measure the customer satisfication by analysing the internal customer compliant database sure will miss the reality. Traditional and trivial data analytics are not good enough anymore. Availability of technologies like Hadoop, HDFS, Avro, MapReduce, Zoo Keeper, Pig, Chukwa, Hive, HBase,R Programming make the big data concept practical.  Emergence of massive unstructured data through social media , utilization of it for daily activities and availability of technologies led into the bigdata now.

All of the core technologies for Bigdata are open source tools. With minimum hiccups during the Easter weekend, Hadoop, MapReduce was successfully installed, configured and functional in Ubuntu Linux runing on Virtual Box on the host OS Windows 7.

There are lots of commercialized version and open source tool available to run an enterprise big data infrastructure. I will write a big data technology landscape as my next topic related to big data.

MoneyBall Architect

Yesterday, I had a coffee talk with one of my external mentee (outside the organization) and he is joining a new employer next week as a data architect. He asked my advice. I started with a disclaimer; my views are not just for a data architect. I expect any architect who joins new organization to do the following. It can also be generalized as a mentoring advice for who joins new organization. The following were my spontaneous response to him.

1. Understand the core business of the organization. If it is a profit organization, understand, how the company is making money? Translate the business model into cash flow diagram in a highest level. Do not make assumption based on the generalized business practice or models. For instance, increasing the customer traffic may increase sales and profit in retail sector but it may not be the case for boutique luxury product or service offering organization. In the boutique luxury product or service organization, the focus may be to retain existing customer. Not to increase the customer base since the supply is very limited and unable to even meet current demand.
2. Understand the culture of the organization. Is the company culture is innovative, fast followers, conservative, aggressive risk takers, collaborative, bureaucratic, autocratic, open, hierarchical (control) and etc.
3. Do due-diligence, investigate, communicate, communicate and communicate with all the key stakeholders in the organization to accomplish 1 and 2.

“It is easy to complicate a thing but it is damn hard to simplify it”

After the short 30 minutes meeting, while driving to work and rushing to take my 8.30 am call in my car, I was thinking the following.

There are terminologies like canonical data model, Meta model, master data management, enterprise data flow, enterprise data bus, enterprise service bus, big data and etc in the realm of data architecture. Quite often, I hear from a passionate data architect about these terminologies in a way, I struggle to understand the tangible benefit. For instance, I hear the definition for enterprise data flow as, enterprise data flow is a structured method that record analyze summarize organize explain the key information which are illustrative to bottom line core business process with inbound outbound flow that indented for the understanding enrichment enhancement and education of key decision maker to make right business decision at the right time to improve overall objective of the business. I didn’t hear the above exact definition but I exaggerated a bit to make my point using Raju Hirani’s idea. Main goal of enterprise data flow is to show critical information to improve ultimate business purpose (like profit). I see architects engage in a prolonged discussion to define taxonomy, framework, methodology, process, tools, governance, stewardship, data quality, reference model and etc. All are great topics and leads into an intellectual discussion, but, sometimes, I noticed the discussion missed to address the ultimate purpose.

It is easy to complicate a thing but it is damn hard to simplify it. My expectation from an architect, including data architect, is to work really hard to simplify the architectural work.

I visualize a data architect as a money-ball architect. For those who have not seen the movie money-ball, the movie is about real life experience in a base-ball team Oakland Athletics where the coach hired Yale graduated economics student who was so passionate about the game and league. He studies the league rules, player profile and creates near optimal data model and analytics to run a successful professional baseball team in the league with lowest investment.

Any successful data architects are money-ball architects. Money-ball architect follows the rule, break the rule, create a new rule and break it until money-ball is identified in the massive multi-dimensional data domain, model the money-ball sub-domain data, identify the key business differentiator from the sub-domain and use it to improve ultimate business purpose.

Money-ball architect will start using canonical data model, Meta model, master data management, enterprise data flow, enterprise data bus, enterprise service bus, big data, taxonomy, framework, methodology, process, tools, governance, reference model (follow the rule). Identify the areas which are not directly contributing to identify the money-ball (break the rule) and drop those areas. Introduce a new concept which directly contributes more to identify the money ball (create new rules) and repeat it until the money ball is identified, modeled and used to improve ultimate business purpose.

To become a successful data architect, create a path for yourself to become a money-ball architect for your organization.

Future of analytics

Analytics, simply defined, a discipline of analysis has been in use for centuries. I was invited to IT leadership symposium organized by secure24, a hosting provider vendor in Michigan, USA. The event was choreographed by Thornton A. May.   It was attended by selective senior executive IT leadership team from various industries in the region.  Event was kicked off with a great opening presentation by Mr. May. The presentation was basically a story telling on how IT industry evolved to add tangible business benefit with simplified historical and anthropology examples. I really enjoyed it.

The second part (part II) of the event was panel discussion. The panel members provided very intriguing ideas, message and concept. I learnt few new things from the panel discussion.

Third part of the event; each table was given a topic and asked to discuss about the topic and present it to everyone.  My table got the topic: “Future of analytics” and I was nominated unanimously to represent our table.  My contribution to the table topic was that the future will depend on the social media and social networks. Other’s contribution was mainly on geo-fencing and its role in the analytics.  Since I was nominated to represent the table, I was structuring my thoughts on how to represent our views while listening to other table topics. Mr. May run out of time and our table was omitted and our views on the topic was not heard. I decided to display my structured thought in my blog. This is how I would have presented.

The panel discussion provided great insights and I learned few new things from the panel members. Top two things I fully agree with panel members are:

1.  Successes of IT organization are measured by its capability and capacity to execute and innovate.
2. Most critical differentiating factor of a successful IT organization is not adapting the latest technology trends like: cloud computing, mobile computing, service oriented architecture, integrated identity management. Most critical differentiating factor of a successful IT organization; “right people at the right job”. People are the one who make things happen in an organization.

“Vision without a plan is a dream and a plan without a vision is run around”

I would like to augment my view to the above points; vision without a plan is a dream and plan without a vision is a run around. People are the one who make things happen, not technology.

To be futurist and strategist, these are the few concepts to keep in mind (a repeat from my last blog post)
1. Timing is everything
2. Learn history and study current (identify the driving factors)
3. Unleash the core and its dependencies – Understand what really matters and its dependencies.
4. Connect the dots

Timing is everything:
Imagine that if  iPhone was launched 12 years ago. We would have connected to the apple store via dial-up (AOL) to down load angry bird. Data plan would have cost us $500 a month and connection speed would have frustated a lot. All other external factors would have made iPhone under PALM category.
Learn history and study current:
As far we know, when Grog in 5000BC used two sticks and rocks to graph the upward trend in sales of his new invention, the wheel, the concept “analytics” was born. Almost took seven thousand years to make a leap in this area.  What did we learn from the history? Analytics played a significant role for a mega success like Romans, Henry Ford and others. Those who understand the deep meaning of analytics made everlasting impact. Current expectation is, let the system make decisions  and receive confirmation from end user to execute the plan.  Trust over the system and acceptance of system generated decision have been increasing. Adoption to this model is accelerating.  Navigation system in the car is a prime example for the current state.
Unleash the core and its dependencies:
People are the one who make things happen and in the most of the profit and non-profit organization, people are direct or indirect consumer of goods and service. It is extremely essential to understand people to define or approximate the future. How people view the world? People define the world based on what they see and hear. World is blue when they view through blue glass and it is red when they view through red glass. Deeper view: for decades, the world has been defined numerous times every day by each individual through the social network. Historically, the social network was through snail mail, family gathering, corporate functions, bars and other occasional events. With the advancement of the technologies like smart phones, global networks, wireless networks, software tools, the social networking happens instantaneously. We are defining, redefining our world based on the instantaneous connection through invisible cosmic social network fiber. It becomes an addiction because we wanted to know what is happening around the world we defined. So facebook is addictive.
Connect the dots:
Advancement of wireless network, mobile platform, social media, and end user computing devices led to higher sophistication and at the same rate people’s mechanical monotonic life style has raised up by few notches.  They don’t have interest or time to view the product or service’s sales offering when they don’t need or not in the mood. At the same, they want to make an instantaneous execution of a decision when a systematic analysis was already performed and a decision is presented with highest level of confidence.
Conclusion:
The future of analytics will be presenting decision to you and by click of a button (or slide of a screen) you can execute the decision. For example, based on the social interaction,system will identify a consumer interest and capacity and capability;skiing during January time frame, received  hefty bonus during x-mas time and carry over vacation from the previous year should be taken before first quarter. System will be presented an offer; 5 day ski trip to the best place with lowest possible rate, with best possible quality. Once you confirm the acceptance of the recommendation/decision, everything will be taken care by the system. Once the consumer enter Denver for skiing, geo-fencing will kick in and based on the interest, pattern, spend behavior the most suitable offers applicable at Denver during that vacation time will be presented by consumer’s car while the consumer driving from Denver airport to Breckenridge ski resort.

Follow

Get every new post delivered to your Inbox.