Hadoop learning notes (1)-hadoop Architecture

Source: Internet
Author: User

HDFS and mapreduce are the core of hadoop. The entire hadoop architecture is mainlyUnderlying support for distributed storage through HDFSAndProgram Support for distributed parallel task processing through mapreduce.

 

I. HDFS Architecture

HDFS usesMaster-slave (Master/Slave) Structure Model. An HDFS cluster is composed of one namenode and several datanode. Namenode acts as the master server to manage the file system namespace and client access to files; datanode in the cluster manages stored data.A typical deployment of HDFS is to run namenode on a dedicated machine. Other machines in the cluster run one datanode each. You can also run datanode on a machine running namenode at the same time, or run multiple datanode on one machine. The design of a cluster with only one namenode greatly simplifies the system architecture.

 

From the end user's perspective, like a traditional file system, it can execute the crud (Create/read/update/delete) operation on the file through the directory path.

 

Namenode manages metadata of the file system, and datanode stores actual data. The client interacts with namenode and datanodes to access the file system. The client contacts namenode to obtain the metadata of the file, and the real file I/O operation Directly Interacts with datanode.

 

IsHDFS Architecture:

File write (or client file upload ):
1. The client initiates a file write request to the namenode.
2. According to the file size and file block configuration, namenode returns the information of the datanode managed by the client.

3. The client divides the file into multiple blocks and writes them to each datanode block in sequence based on the datanode address information.

File Reading:

1. The client initiates a File Read Request to the namenode.

2. namenode returns the datanode information stored in the file.

3. The client reads the file information.

Client:SetFile splittingUploads blocks in sequence; interacts with namenode to obtain file location information; interacts with datanode to read or write files; manages and accesses HDFS

 

 

Ii. mapreduce Architecture

Mapreduce isParallel Programming ModeIn this mode, software developers can easily compile distributed parallel programs. In the hadoop architecture, mapreduce is a simple and easy-to-use software framework.Task DistributionTo a cluster composed of thousands of Taiwanese machinesReliable Fault ToleranceAllows you to process a large number of datasets in parallel to implement hadoop parallel tasks.

 

The mapreduce framework is composed of a single jobtracker running on the master node and a tasktracker running on the slave node of each cluster. The master node is responsible for scheduling all tasks that constitute a job. These tasks are distributed across different slave nodes. The master node monitors their execution and re-executes the previously failed tasks. The slave node is only responsible for the tasks assigned by the master node.

 

When a job is submitted, after jobtracker receives the submitted job and Its configuration information, it will distribute the configuration information to the slave node, schedule the job, and monitor the execution of tasktracker.

 

 

 

Will be constantly improved and supplemented in the future ......

 

 

 

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.