Tag Archives: big data

5 Keys to Analytics

Screen Shot 2015-07-20 at 5.53.40 PM

Big Data technologies and Analytics promise to unleash an unprecedented amount and depth of insights from huge unstructured datasets of infinite data. But it is not all about technology. Big data need big brains, and here are 5 keys to structure your analytics designs and experiments:

1. Questions. Without the right question you won’t get the right answer. Knowing what to ask is the #1 key success factor for any analysis.
2. Listening. It is important to acquire and capture all data that might be relevant to your questions. You need to identify all sources and make use of technology to cope with real-time data in any amount that is required to reach your goal.
3. Structure the data, so that you can model it. The model should enable you to pose diverse questions to your dataset, so that you can test different hypothesis.
4. Categorize. Find categories, discover attributes, similarities or abstractions that put a higher layer of structure.
5. Find patterns that will hold true in new situations not yet encountered -> predictive analytics

Analytics is a circular process where the analyst must constantly seek for patterns to set hypothesis and the drive experiments to validate them.

Computer scientists are interested in finding the needle in the haystack.
Social scientist are interested in characterizing the haystack.

Both approaches have a big role in data science and analytics. Data scientist must understand which one of the two they are pursuing when they design the strategy and questions to reach the goal for your analysis.

With so much data to mine for, you need the right tools and methodology to find your gems. Still, big data needs big brains because “a fool with a tool is still a fool.”


Enhanced by Zemanta

Lean Analytics: Why is Big Data so Disruptive?


Big Data technologies have changed the way we collect data, enabling us to handle infinite amounts of data.

In the old model, with relational databases, you first define the schema for what you collect and then put data into that schema, before analyzing it with BI tools. This is how data warehouse and data mining have been used during decades. You first needed to figure out the question, then you collected the data.


Big Data enables you to change that order. Now, you collect unstructured data first, and you ask the question later.


Modern analytics start by collecting everything, and then formulating your question.

This is what Alistair Croll explains in his book and in the slides and charts above.

The new paradigm allows you to search for the “unknown unknowns.” In analytics, most good answers will lead to another question. Data driven decisions depend on the ability to ask better questions and then ask again.

Implications for Business

In times of rapid market changes, what differentiates your business is how fast you experiment, how fast you can ask iterative questions and measure your progress, and how fast you readjust your business based on your learnings.

These are the principles behind the lean startup, and behind innovation. The most important metric for modern companies becomes “how fast does your organization learn?”

Big Data changes the cost of making data-driven decisions. It is an enabler for a more disciplined and empirical thinking about innovation and strategy.  In that sense, Big Data analytics becomes one more enabler for disruption.

Enhanced by Zemanta

Big Data in 5 words

Manage Infinite Data. Get answers.

Many CIOs and CMOs may feel overwhelmed about the hype on Big Data. The sentence above gives some more clarity on top of the traditional 3Vs, that IBM even turns into four:

  • Volume : Mass quantity of data that technologies today can handle. Virtually unlimited.
  • Variety :Iintegrate and analyze data from an array of structured and un-structured data sources, including: databases, sensors, video, log files, clicks and more.
  • Velocity : High speed at which data is created, processed and analyzed, allowing for real-time answers based on real-time streaming sources.
  • Veracity :  Managing the reliability and predictability of data sources.

Here is the landscape of companies in different segments of Big Data, from Dave Feinleib for Forbes. The two layers down take care of  “managing infinite data”. The two layers above are teh ones to “get answers.”


Enhanced by Zemanta

Leaders in Big Data

Learn what experts think about big data
From Google Tech Talks: Discussing the evolution, current opportunities and future trends in big data.

Moderator: Hal Varian,  Chief Economist at Google.
Theo Vassilakis, Principal Engineer/Engineering Director at Google
Gustav Horn, Senior Global Consulting Engineer, Hadoop at NetApp
Charles Fan, Senior Vice President at VMware in strategic R&D

Quick Notes from the panel

What’s big data in short?
Unlimited huge amount of unstructured data captured in real time, from which you can get meaningful answers fast.

The promise:
Never delete data again. Keep it all as it comes. Simply focus on asking the right question to get answers.

An example:
In Healthcare, you could store the complete human genome of all cancer patients combined with other possible relevant patient info (habits, living conditions, etc) throughout their lives. The correlation of such a massive dataset would provide wonderful insight with the right analytics.

More and more devices, more servers, cameras, sensors, etc. Big Data techniques allow now to economically store all that info. And Mapreduce techniques allows you to query and analyse those huge amounts of unstructured data and get meaningful results.

Open source give people the choice. Hadoop is becoming defacto standard for data storage and data management. It is enabling disparate datasets from unrelated systems to be dumped into a Hadoop cluster. That enables to look for correlations between them. (xml, csv… there will be chaos for a little while)

We are getting better natural language programming. That means more and more of natural language logic and less programming.

Four layers of functionality in Big Data:

  • applications
  • analytics
  • management: query engines
  • data infrastructure: data storage

Lower layers standardize earlier. A common standard will emerge.

The biggest problem  for enterprises
A big problem of companies today is the number of heterogeneous databases with different formats. Connecting them to Hadoop is the key, but vendors will need to provide connectors, which do not help their business of selling licenses.
Google has some advantage compared to a normal enterprise, as in Google all their data is standardized.

For some, a bigger problem for enterprises is knowing what to do with so much data. Experimentation is key to mine and find meaningful insights.

Is the #1 problem the amount of data? or not to know how to use that data?

Model for hardware infrastructure : buy or lease in the cloud?
There is a place for both. Cloud is a good place to get started.

Will SQL become obsolete by languages like NoSQL?
People using NoSQL are looking for more flexible, easier schemas. It relaxes the requirements for the database. Many developers are embracing them, but there is no clear winner yet.
The good about NoSQL is that is a declarative language. It doesn’t state how to do it, just what you want to get.
But SQL will be around for a long time, even if just for legacy databases.

In summary, Big Data will:

  • provide answers fast
  • get infinite in volume and size
  • never forget anything

My  two-cents after listening to the experts: a definition of Big Data in five words.

“Manage infinite data. Get answers.”

Enhanced by Zemanta