Tag Archives: Hadoop

Leaders in Big Data

Learn what experts think about big data
From Google Tech Talks: Discussing the evolution, current opportunities and future trends in big data.

SPEAKERS:
Moderator: Hal Varian,  Chief Economist at Google.
Panelists:
Theo Vassilakis, Principal Engineer/Engineering Director at Google
Gustav Horn, Senior Global Consulting Engineer, Hadoop at NetApp
Charles Fan, Senior Vice President at VMware in strategic R&D

Quick Notes from the panel

What’s big data in short?
Unlimited huge amount of unstructured data captured in real time, from which you can get meaningful answers fast.

The promise:
Never delete data again. Keep it all as it comes. Simply focus on asking the right question to get answers.

An example:
In Healthcare, you could store the complete human genome of all cancer patients combined with other possible relevant patient info (habits, living conditions, etc) throughout their lives. The correlation of such a massive dataset would provide wonderful insight with the right analytics.

Trends
More and more devices, more servers, cameras, sensors, etc. Big Data techniques allow now to economically store all that info. And Mapreduce techniques allows you to query and analyse those huge amounts of unstructured data and get meaningful results.

Standards
Open source give people the choice. Hadoop is becoming defacto standard for data storage and data management. It is enabling disparate datasets from unrelated systems to be dumped into a Hadoop cluster. That enables to look for correlations between them. (xml, csv… there will be chaos for a little while)

We are getting better natural language programming. That means more and more of natural language logic and less programming.

Four layers of functionality in Big Data:

  • applications
  • analytics
  • management: query engines
  • data infrastructure: data storage

Lower layers standardize earlier. A common standard will emerge.

The biggest problem  for enterprises
A big problem of companies today is the number of heterogeneous databases with different formats. Connecting them to Hadoop is the key, but vendors will need to provide connectors, which do not help their business of selling licenses.
Google has some advantage compared to a normal enterprise, as in Google all their data is standardized.

For some, a bigger problem for enterprises is knowing what to do with so much data. Experimentation is key to mine and find meaningful insights.

Is the #1 problem the amount of data? or not to know how to use that data?

Model for hardware infrastructure : buy or lease in the cloud?
There is a place for both. Cloud is a good place to get started.

Will SQL become obsolete by languages like NoSQL?
People using NoSQL are looking for more flexible, easier schemas. It relaxes the requirements for the database. Many developers are embracing them, but there is no clear winner yet.
The good about NoSQL is that is a declarative language. It doesn’t state how to do it, just what you want to get.
But SQL will be around for a long time, even if just for legacy databases.

In summary, Big Data will:

  • provide answers fast
  • get infinite in volume and size
  • never forget anything

My  two-cents after listening to the experts: a definition of Big Data in five words.

“Manage infinite data. Get answers.”

Enhanced by Zemanta