Here is a general introduction to Hadoop.
Most of this article is from the official website of Hadoop. One of them is an introduction to HDFs's PDF document, which is a comprehensive introduction to Hadoop. My this series of Hadoop learning Notes is also from here step-by-step down, at the same time, referring to a lot of articles on the Web, to learn the problems encountered in Hadoop summarized.
Anyway, let's start with the ins and outs of Hadoop. When it comes to Hadoop, you have to mention Lucene and Nutch. First of all, Lucene is not an application, but provides a pure Java high-performance full-text Indexing Engine toolkit, which can be easily embedded in a variety of practical applications to achieve full-text search/indexing capabilities. Nutch is an application, is a lucene based on the implementation of the search engine applications, Lucene provides nutch text search and indexing Api,nutch not only the search function, but also the function of data capture. Before the nutch0.8.0 version, Hadoop was part of the Nutch, and from nutch0.8.0, the NDFs and MapReduce that were implemented in it were stripped out to create a new open source project, which was Hadoop, and the nutch0.8.0 version was more than the previous nutch in the architecture The fundamental change is that it is entirely built on the basis of Hadoop. Google's GFS and MapReduce algorithms are implemented in Hadoop, making Hadoop a distributed computing platform.
In fact, Hadoop is not just a distributed file system for storage, but a framework designed to perform distributed applications on a large cluster of general-purpose computing devices.
Hadoop contains two parts:
1, HDFS
Hadoop Distributed File System (Hadoop distributed filesystem)
HDFs is highly fault tolerant and can be deployed on low-priced hardware devices. HDFs is ideal for applications with large datasets and provides a high throughput for reading and writing data. HDFs is a master/slave structure that, for normal deployments, runs only one namenode on master and one datanode on each slave.
HDFs supports the traditional hierarchical file organization structure, which is similar to some existing file systems, such as you can create and delete a file, move a file from one directory to another, rename, and so on. Namenode manages the entire distributed file system, and the operation of file systems (such as creating, deleting files, and folders) is controlled by Namenode.
The following is the structure of the HDFS: