July 12, 2017
I totally geeked out when I saw first-hand all the latest developments taking place in the big data space. Even though I have been working in the Business Intelligence (BI) industry for several years, this was my first time attending the Big Data Toronto conference. Let me share some of my learning here through the lens of a Product Manager.
Big Data is Finally Not Alone
This year's Big Data Toronto was three events combined into one; the other two events were Connected Plus Toronto (focusing on IoT) and Artificial Intelligence (AI) Toronto. No longer is Big Data just a silo by itself – it now consumes data from more devices than ever, and there is a much bigger appetite for AI initiatives to consume data in both structured and unstructured forms.
Big Data Grows Up
According to this Harvard Business Review article, 48.4% of Fortune 1000 companies achieve measurable results from their big data investments. It does not stop there - tools and software offerings are maturing for major adoption. Gone are the days when you had to set up each individual component from scratch. Now organizations have the freedom to choose a free community version or a paid enterprise version from multiple vendors. I wouldn’t be surprised to see more and more companies implementing Hadoop to solve big data problems, and even medium data problems (where you have too much data to be processed on one machine, but not quite there yet at the petabyte level).
Data Ingestion Becomes a Harder Problem
The traditional way of ingesting source data files in a series using a single system and method is no longer feasible – we are now dealing with many IoT devices that feed in data in many different formats, and business users demand the flexibility to look at data in many ways. We are at a stage where we need to process huge quantities of data, and we need to process it faster than ever.
Several talks at the conference touched on Kafka, a stream processing platform originally developed by LinkedIn. Kafka is a solution to the aforementioned problem. It is a distributed system that treats the data ingestion problem as a messaging problem. The ingestion process is called a topic, and consumers can subscribe to a topic to receive messages. Topics are partitioned and replicated across multiple machines for scalability and durability. Kafka does not make the data ingestion problem simpler; rather, it makes the problem more palatable. You can read more about it from the Cloudera blog.
Big Data is no longer just hype. People are treating it with seriousness, and eventually it will reach the critical point and become a corporate standard, just like relational databases. IoT and AI are still in their infancy, but they are rapidly gaining traction as the input and output points of a data-driven organization.
About the AuthorMore Content by Edmond Chan