Hadoop Basic Architecture

Source: Internet
Author: User
Tags hadoop mapreduce

Hadoop consists of two parts, the Distributed file system and the distributed computing Framework MapReduce. The Distributed file system is mainly used for distributed storage of large-scale data, while MapReduce is built on Distributed file system, and distributed computing is carried out for the data stored in Distributed File system. This article mainly deals with MapReduce, but considering that some of its functions are related to the underlying storage mechanism, the Distributed File system is introduced first.

In Hadoop, the mapreduce underlying distributed file system is a standalone module that allows users to implement their own distributed file system in accordance with the agreed set of interfaces, and then after a simple configuration, the data stored on the filesystem can be processed by MapReduce. The Distributed file system used by Hadoop by default is HDFS (Hadoop distributedfile system, Hadoop Distributed File System), which is tightly integrated with the MapReduce framework. This section first introduces the infrastructure of the distributed storage System HDFS and then introduces the MapReduce computing framework.

HDFS Architecture

HDFS is a highly fault-tolerant distributed file system that is suitable for deployment on inexpensive machines. HDFS provides high-throughput data access and is ideal for applications on large-scale datasets. The architecture of HDFS, in general, employs the Master/slave architecture, consisting mainly of the following components: Client, NameNode, Secondary, NameNode, and DataNode. The following sections describe each of these components.

(1) Client
The client (on behalf of the user) accesses the files in HDFS by interacting with NameNode and DataNode. The client provides a POSIX-like file system interface for user invocation.
(2) NameNode
There is only one NameNode in the entire Hadoop cluster. It is the "explorer" of the entire system and is responsible for managing the HDFs directory tree and related file metadata information. This information is stored in the local disk in the form of "Fsimage" (HDFs metadata image file) and "Editlog" (HDFs file churn log) two files, which are reconstructed when HDFS is restarted. In addition, NameNode is responsible for monitoring the health status of each DataNode, and once a DataNode is found to be down, move the DataNode out of HDFS and back up the data on it.
(3) Secondary NameNode
The most important task for secondary NameNode is not to make hot backups of NameNode metadata, but to periodically merge fsimage and edits logs and transfer them to NameNode. It is important to note that in order to reduce NameNode pressure, NameNode does not merge fsimage and edits, and stores the files on disk instead of secondary NameNode.
(4) DataNode
In general, a DataNode is installed on each Slave node, which is responsible for the actual data storage and periodically reports the data to NameNode. DataNode organizes the contents of a file in a fixed-size block, with a block size of 64MB by default. When a user uploads a large file to HDFS, the file is sliced into blocks, stored separately into different DataNode, and, in order to ensure reliable data, writes the same block to several lines (default is 3, which can be configured) different DataNode On The process of storing this file after cutting is transparent to the user.

MapReduce Architecture

Like HDFS, Hadoop MapReduce uses the Master/slave (M/s) architecture, specifically. It consists mainly of the following components: Client, Jobtracker, Tasktracker, and Task. The following sections describe each of these components.


(1) Client
The user-written MapReduce program is submitted to the Jobtracker through the client, while the user can view the status of the job through some interfaces provided by the client. Use "job" to represent a MapReduce program inside Hadoop. A mapreduce program can correspond to several jobs, and each job is decomposed into several map/reduce tasks (Task).
(2) Jobtracker
Jobtracker is mainly responsible for resource monitoring and job scheduling. Jobtracker monitors the health of all tasktracker and jobs, and once a failure is found, it shifts the task to another node, and Jobtracker tracks the progress of the task, the amount of resources used, and tells the Task Scheduler The scheduler chooses the appropriate task to use these resources when the resource is idle. In Hadoop, the Task Scheduler is a pluggable module that allows the user to design the appropriate scheduler to suit their needs.
(3) Tasktracker
Tasktracker periodically reports the usage of resources on this node and the progress of tasks to jobtracker through Heartbeat, and receives the commands sent by Jobtracker and performs the corresponding actions (such as starting new tasks, killing tasks, etc.). Tasktracker uses "slot" to divide the amount of resources on this node equally. "Slot" stands for compute resources (CPU, memory, etc.). Once a Task acquires a slot to run, the Hadoop Scheduler is used to assign the free slots on each tasktracker to the task. The slots are divided into the MAP slots and the reduce slot two, which are used by maptask and reduce Task respectively. Tasktracker limits the concurrency of a Task by the number of slots (configurable parameters).
(4) Task
The task is divided into Map task and Reduce task two, all of which are started by Tasktracker. As we know from the previous section, HDFS stores data in a fixed-size block as the base unit, whereas for MapReduce, the processing unit is split. Split is shown in the corresponding relationship with block 2-6. Split is a logical concept that contains only some metadata information, such as the starting position of the data, the length of the data, the node where the data resides, and so on. Its partitioning method is entirely up to the user's own discretion. However, it is important to note how much of split determines the number of map tasks because each split is referred to a map task.

Map Task execution is shown in procedure 2-7. The graph shows that the map Task first parses the corresponding split iteration into a Key/value pair, which in turn calls the user-defined Map () function to process, eventually storing the temporary results on the local disk, where the temporary data is divided into several partition, each part The ition will be processed by a Reduce Task.

Reduce Task execution is shown in procedure 2-8. The process is divided into three stages ① reads the Maptask intermediate result (called the "Shuffle Stage") from the remote node, ② sorts the key/value pairs by key (called "Sort Stage"), ③ reads <key sequentially, value list> The user-defined reduce () function is called to process and the final result is stored on HDFS (called the "reduce phase").

Life cycle of Hadoop MapReduce jobs

Suppose the user writes a MapReduce program, packages it into a Xxx.jar file, and then submits the job using the following command:

$HADOOP _home/bin/-D mapred.job.name="xxx"-D mapred.map.tasks=3 -D mapred.reduce.tasks=2-D input=/test/-D output=/test/output 

Then the job runs:


This process is divided into the following 5 steps:
Step 1: Job submission and initialization. After the user submits the job, the job-related information is first uploaded by the jobclient instance, such as uploading the program jar package, the job configuration file, the Shard meta-information file, etc. to the Distributed File system (typically HDFs), where the Shard meta-information file records the logical location information for each input shard. Then Jobclient notifies Jobtracker via RPC. Jobtracker when a new job submission request is received, the job is initialized by the job Dispatch module: A Jobinprogress object is created for the job to track the job health, and jobinprogressCreate a Taskinprogress object for each task to track the running state of each of the tasks, taskinprogress mayMultiple "task run attempts" (called "task attempt") need to be managed.
Step 2: Task scheduling and monitoring. As mentioned earlier, the functions of task scheduling and monitoring are completed by Jobtracker. Tasktracker periodically reports the resource usage of this node through Heartbeat to Jobtracker, and once an idle resource is present, Jobtracker chooses a suitable task according to a certain policy to use the idle resource, which is done by the Task Scheduler. The Task Scheduler is a pluggable standalone module and is a two-tier architecture that selects the job first and then selects the task from that job, where you need to focus on data locality when selecting a task. In addition, Jobtracker tracks the entire running process of the job and provides a full range of protection for the successful operation of the job. First, when the Tasktracker or task fails, the compute task is transferred, and second, when a task execution progresses far behind other tasks in the same job, it initiates an identical task for it and selects a fast task result as the final result.
Step 3: Prepare the task to run the environment. The operational environment readiness includes JVM startup and resource isolation, all implemented by Tasktracker. Tasktracker initiates a separate JVM for each task to prevent different tasks from interacting with each other, while Tasktracker uses the operating system process to isolate resources to prevent task abuse.
Step 4: Task execution. Tasktracker The task is started when the task is ready to run. During the run, the latest progress of each task is first reported to Tasktracker by the task via RPC, and then by Tasktracker to Jobtracker.
Step 5 The job is complete. After all the tasks have been executed, the entire job executes successfully.

PS: This article from "Hadoop Technology insider in-depth understanding of MapReduce architecture design and implementation principle," a book, not original, the purpose of the article for notes use, convenient personal view, master do not spray!

Resources

Deep understanding of the design and implementation of MapReduce architecture in Hadoop technology Insider

Hadoop Basic Architecture

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.