Hadoop should be the trend of history, along with the theoretical exploration, scientific and technological experiments continue to develop, Hadoop finally came out in 2006, shocked the world to weep ghosts and spirits! (Big podium-the first in-country it on-line hybrid adaptive learning Platform, http://www.dajiangtai.com)
The beginning of the Hadoop prototype in 2002, Apache Nutch,nutch is an open source Java-implemented search engine. It provides all the tools you need to run a search engine. Includes full-text search and web crawlers.
In 2003, Google published a technical academic paper Google File system (GFS). GFS, also known as Google File System, is a proprietary filesystem designed by Google to store massive amounts of search data.
2004 Nutch founder Doug Cutting, based on Google's GFS thesis, implemented a distributed file storage system named NDFs.
In 2004, Google also published a technical academic paper mapreduce. MapReduce is a programming model for parallel analysis operations of large datasets (larger than 1TB).
2005 Doug Cutting was also based on MapReduce, which was implemented in the Nutch search engine.
In 2006, Yahoo hired Doug cutting,doug cutting to name NDFs and MapReduce upgrades as Hadoop,yahoo created an independent team to specialize in the development of Hadoop for Goug cutting. It has to be said that Google and Yahoo have contributed to Hadoop.
In short, Hadoop is a software platform that makes it easier to develop and run large-scale data processing. Its core is HDFs and MapReduce.
HDFS (Hadoop Distributed File System,hadoop distributed filesystem), it is a highly fault-tolerant system that is suitable for deployment on inexpensive machines. HDFS provides high-throughput data access for applications with very large datasets (large data set), which is summed up in one sentence: HDFs is better for large amounts of data (typically at the TB level). MapReduce is a set of programming models for extracting analysis elements from massive source data and finally returning the result set, which is the first step in distributing the files to the hard disk, and extracting the analysis from the massive data what we need is what MapReduce did, A word to summarize: MapReduce facilitates the computation of large amounts of data.
In terms of the significance and value of big data, I think the most concise summary: Big data can be straight to the truth of the event! The internet era of data gradually increased, it is said that Baidu's search page volume is at the TB level. Prior to the Apache architecture, although the data can be calculated and stored, but far from meeting the modern scale, so it is necessary to develop new technology specifically for big data processing, which is also the technology, including Spark, Hadoop and other emerging background conditions.
Hadoop technology to a certain extent, can establish a three-dimensional analysis system, its object can be a consumer or Web site or app, can be a multi-dimensional analysis of a large number of data, so that the object of the event truth, can give an example to explain its specific performance: the Future, Maybe the consumer doesn't know what he likes, but big data can tell what he should like by his historical behavior.
What is Hadoop?