Big Data Imf-l38-mapreduce Insider decryption Lecture notes and summary

Source: Internet
Author: User
Tags hadoop mapreduce

Contents of this issue:

1 MapReduce Schema decryption

Research on 2 mapreduce running clusters

3 working with MapReduce in Java programming


Hadoop from 2. 0 had to run on yarn at first, and 1.0 didn't care about yarn at all.

Now it's MR, yarn-based, and it's an introductory phase. 0 The foundation has passed.


Starting tomorrow-a collection of around 20 MapReduce codes to explain


One: Yarn-based MapReduce architecture

1.MR Code program is based on the implementation of mapper and reducer two phases, wherein Mapper is a computational task decomposition into many

The small task carries on the parallel computation, the reducer is carries on the final statistic the work;


2.Hadoop 2.x starts with yarn-based operation.


Yarn is the management of all resources of the cluster (such as memory and CPU), ResourceManager, a JVM process is scheduled on each node, NodeManager, receiving requests to wrap these resources in container way, when RM receives the job request,


3. When ResourceManager receives the client-submitted request program, it will command NodeManager to start the first container of the program based on the status of the cluster resource on the node where the NodeManager resides. The container is the program's applicationmaster, responsible for the execution of the program's task scheduling, Applicationmanager turn to ResourceManager register their own, After registration, a specific container computing resource will be applied to the Reourcemanager.

4. How many container does it take to applicationmaster a program in a street?

Application will run the main method of the program at startup, the method will have data input and related configuration, through which you can know how many container need;


(Container is a unit of computer resources, according to the client request calculation, the cluster will resolve the calculation job, the calculation results include the required contain resources)

Application to run the main method, know how many shards the parser has, how many shards correspond to container, and then consider other resources, such as shuffle, to allocate some resources.


Summary of 5.MapReduce running on yarn

Master-Slave structure

Master node, only one: ResourceManager

Control node, each job has a mrappmaster

From the node, there are a number of: Yarnchild

ResourceManager is responsible for:

Receive client-submitted calculation tasks

Assign the job to Mrappmaster execution

Monitor the implementation of Mrappmaster

Mrappmaster is responsible for:

Responsible for task scheduling for a job execution

Assign the job to Yarnchild execution

Monitor the implementation of Yarnchild

Yarnchild is responsible for:

Perform compute tasks for mrappmaster assignment


The RM production environment is to do ha


Mrappmaster in 6.Hadoop MapReduce, equivalent to Driver,hadoop in spark The Yarnchildren in MapReduce corresponds to the coarsegrainedexecutorbackend in spark;


(Hadoop has a considerable amount of loss relative to spark resources)

This article is from the "in the Cloud" blog, be sure to keep this source http://ymzhang.blog.51cto.com/3126300/1741453

Big Data Imf-l38-mapreduce Insider decryption Lecture notes and summary

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.