A brief introduction to Hadoop learning Notes

Source: Internet
Author: User
Tags file system

Here is a general introduction to Hadoop.

Most of this article is from the official website of Hadoop. One of them is an introduction to HDFs's PDF document, which is a comprehensive introduction to Hadoop. My this series of Hadoop learning Notes is also from here step-by-step down, at the same time, referring to a lot of articles on the Web, to learn the problems encountered in Hadoop summarized.

Anyway, let's start with the ins and outs of Hadoop. When it comes to Hadoop, you have to mention Lucene and Nutch. First of all, Lucene is not an application, but provides a pure Java high-performance full-text Indexing Engine toolkit, which can be easily embedded in a variety of practical applications to achieve full-text search/indexing capabilities. Nutch is an application, is a lucene based on the implementation of the search engine applications, Lucene provides nutch text search and indexing Api,nutch not only the search function, but also the function of data capture. Before the nutch0.8.0 version, Hadoop was part of the Nutch, and from nutch0.8.0, the NDFs and MapReduce that were implemented in it were stripped out to create a new open source project, which was Hadoop, and the nutch0.8.0 version was more than the previous nutch in the architecture The fundamental change is that it is entirely built on the basis of Hadoop. Google's GFS and MapReduce algorithms are implemented in Hadoop, making Hadoop a distributed computing platform.

In fact, Hadoop is not just a distributed file system for storage, but a framework designed to perform distributed applications on a large cluster of general-purpose computing devices.

Hadoop contains two parts:

1, HDFS

Hadoop Distributed File System (Hadoop distributed filesystem)

HDFs is highly fault tolerant and can be deployed on low-priced hardware devices. HDFs is ideal for applications with large datasets and provides a high throughput for reading and writing data. HDFs is a master/slave structure that, for normal deployments, runs only one namenode on master and one datanode on each slave.

HDFs supports the traditional hierarchical file organization structure, which is similar to some existing file systems, such as you can create and delete a file, move a file from one directory to another, rename, and so on. Namenode manages the entire distributed file system, and the operation of file systems (such as creating, deleting files, and folders) is controlled by Namenode.

The following is the structure of the HDFS:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.