So what's really driving the adoption for Hadoop?
The answer is quiet simple. It's the big data. Data that is in completely unstructured and semi unstructured form coming from various different sources in huge amounts at an unprecedented high pace.
The three V's are driving the business case for Hadoop: Volume, Variety and Velocity. We don't have tools and technologies that can handle the three V's at the same time. So Apache Hadoop comes to rescue. Hadoop is able to store Exabytes of data from variety of sources in it's raw form. It's better to store data in Hadoop in it's raw form and schematize only on read. This enables us to ingest data at a super high speed without any friction. The goal should be to ingest data as fast as possible in its raw form and worry about defining the schema later when ready to read.
ERP Systems = Gigabytes of data
CRM = Terabytes of data
Web = Petabytes of data
Big Data = Exabytes
So the bottom line is that the volume and variety of data is so much that our traditional systems cannot handle it and hit a wall. These systems are not able to store and process the variety and volume of data.
Hence the 3 V's are the driving factors for the adoption of Hadoop.
Also, the expensive storage cost and an inability to analyze big data quickly are the leading factors in the Hadoop adoption as well. The existing queries would continue to slow down as data increases in your traditional systems; reports and dashboards will not be able to render in a timely manner. So eventually business will not be able to make key decisions.
Volume:
Volume refers to the amount of data that's getting generated in terabytes, petabytes and zeta-bytes. Factors that are contributing to this increase in volume: data from social sites, network sensors, web logs, machine sensors, RFIDs etc
Issues:
Velocity:
Variety:
The answer is quiet simple. It's the big data. Data that is in completely unstructured and semi unstructured form coming from various different sources in huge amounts at an unprecedented high pace.
The three V's are driving the business case for Hadoop: Volume, Variety and Velocity. We don't have tools and technologies that can handle the three V's at the same time. So Apache Hadoop comes to rescue. Hadoop is able to store Exabytes of data from variety of sources in it's raw form. It's better to store data in Hadoop in it's raw form and schematize only on read. This enables us to ingest data at a super high speed without any friction. The goal should be to ingest data as fast as possible in its raw form and worry about defining the schema later when ready to read.
ERP Systems = Gigabytes of data
CRM = Terabytes of data
Web = Petabytes of data
Big Data = Exabytes
So the bottom line is that the volume and variety of data is so much that our traditional systems cannot handle it and hit a wall. These systems are not able to store and process the variety and volume of data.
Hence the 3 V's are the driving factors for the adoption of Hadoop.
Also, the expensive storage cost and an inability to analyze big data quickly are the leading factors in the Hadoop adoption as well. The existing queries would continue to slow down as data increases in your traditional systems; reports and dashboards will not be able to render in a timely manner. So eventually business will not be able to make key decisions.
Volume:
Volume refers to the amount of data that's getting generated in terabytes, petabytes and zeta-bytes. Factors that are contributing to this increase in volume: data from social sites, network sensors, web logs, machine sensors, RFIDs etc
Issues:
- Storage is very expensive
- Fast analysis of data
Velocity:
The pace at which data is getting created. The reason for such highly velocity is social media, machine sensors, RFIDs, device, IoT (internet of things) etc.
Issues:
- Not able to react fast to the incoming stream of data to make key decisions. example: credit card fraud, system failures etc.
Variety:
Variety refers to the proliferation of data sources. Data is coming from every where now. A person these days own multiple devices, social media, videos, audio, logs.
Apache Hadoop makes the Big data easily available for the analytics applications.
Apache Hadoop makes the Big data easily available for the analytics applications.