Contents of this issue:
1 MapReduce Schema decryption
Research on 2 mapreduce running clusters
3 working with MapReduce in Java programming
Hadoop from 2. 0 had to run on yarn at first, and 1.0 didn't care about yarn at all.
Now it's MR, yarn-based, and it's an introductory phase. 0 The foundation has passed.
Starting tomorrow-a collection of around 20 MapReduce codes to explain
One: Yarn-based MapReduce architecture
1.MR Code program is based on the implementation of mapper and reducer two phases, wherein Mapper is a computational task decomposition into many
The small task carries on the parallel computation, the reducer is carries on the final statistic the work;
2.Hadoop 2.x starts with yarn-based operation.
Yarn is the management of all resources of the cluster (such as memory and CPU), ResourceManager, a JVM process is scheduled on each node, NodeManager, receiving requests to wrap these resources in container way, when RM receives the job request,
3. When ResourceManager receives the client-submitted request program, it will command NodeManager to start the first container of the program based on the status of the cluster resource on the node where the NodeManager resides. The container is the program's applicationmaster, responsible for the execution of the program's task scheduling, Applicationmanager turn to ResourceManager register their own, After registration, a specific container computing resource will be applied to the Reourcemanager.
4. How many container does it take to applicationmaster a program in a street?
Application will run the main method of the program at startup, the method will have data input and related configuration, through which you can know how many container need;
(Container is a unit of computer resources, according to the client request calculation, the cluster will resolve the calculation job, the calculation results include the required contain resources)
Application to run the main method, know how many shards the parser has, how many shards correspond to container, and then consider other resources, such as shuffle, to allocate some resources.
Summary of 5.MapReduce running on yarn
Master-Slave structure
Master node, only one: ResourceManager
Control node, each job has a mrappmaster
From the node, there are a number of: Yarnchild
ResourceManager is responsible for:
Receive client-submitted calculation tasks
Assign the job to Mrappmaster execution
Monitor the implementation of Mrappmaster
Mrappmaster is responsible for:
Responsible for task scheduling for a job execution
Assign the job to Yarnchild execution
Monitor the implementation of Yarnchild
Yarnchild is responsible for:
Perform compute tasks for mrappmaster assignment
The RM production environment is to do ha
Mrappmaster in 6.Hadoop MapReduce, equivalent to Driver,hadoop in spark The Yarnchildren in MapReduce corresponds to the coarsegrainedexecutorbackend in spark;
(Hadoop has a considerable amount of loss relative to spark resources)
This article is from the "in the Cloud" blog, be sure to keep this source http://ymzhang.blog.51cto.com/3126300/1741453
Big Data Imf-l38-mapreduce Insider decryption Lecture notes and summary