Hadoop is an open source platform for distributed storage and distributed processing of large dataset on computer clusters built from commodity hardware. Hadoop is meant to be run on an entire cluster of PCs that run inside a rack in some data center. It leverages the power of multiple computers to handle big data. In addition, the important advantage in distributed storage of Hadoop is that the users can just keep on adding more and more computers to their cluster and their hard drive. Other things about Hadoop is the distributed processing, so not only it can store vast amounts of data across an entire cluster of computers but also it can distribute the processing of that
HISTORY OF HADOOP
Hadoop was not the first solution to process and analysis of the big data problems. However, Google is the pioneer of all this process. A couple of papers published in 2003 and 2004 by Google in this area. One of them was about Google file system (GFS) which is the foundation of the ideas behind how Hadoop does its distributed storage. So, GFS is basically what inspired Hadoop’s distributed data storage and MapReduce is what inspired Hadoop’s distributed processing.
Hadoop was developed originally by Yahoo. They were building something called Nutch which was an open source web search engine at the time and primarily Doug Cutting and Time White started putting Hadoop together in 2006. Hadoop was the name of Doug Cutting’s kid’s toy elephant which apparently was a yellow elephant and was named after it.