Morning Course: 6:00am
Hadoop MapReduce Insider Decryption:
Mr Schema decryption
-
Java Operations Mr Combat
"Accompanying notes":
One: Yarn-based MapReduce architecture
1.MapReduce Code program is based on the implementation of mapper and reducer two phases, wherein Mapper is a computational task decomposition into many small tasks for parallel computing, reduce the final statistical work;
2.Hadoop 2.x start is yarn-based (1.x version is not concerned with yarn)
Yarn is the management of all resources of the cluster (such as memory and CPU), ResourceManager, a JVM process is scheduled on each node, NodeManager, receiving requests to wrap these resources in container way, when RM receives the job request,
650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M01/7A/D2/wKioL1a6lC-g0w4xAAC1Z_xFRSY653.png "style=" float: none; "title=" 1.png "alt=" Wkiol1a6lc-g0w4xaac1z_xfrsy653.png "/>
3. When ResourceManager receives the client-submitted request program, it will command NodeManager to start the first container of the program based on the status of the cluster resource on the node where the NodeManager resides. The container is the program's applicationmaster, responsible for the execution of the program's task scheduling, Applicationmanager turn to ResourceManager register their own, After registration, a specific container computing resource will be applied to the Reourcemanager.
650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/7A/D2/wKioL1a6lDCQu0E_AACRhOZXAaI915.png "title=" 2.png " Style= "Float:none;" alt= "Wkiol1a6ldcqu0e_aacrhozxaai915.png"/>
4. How many container does it take to applicationmaster a program in a street?
Application will run the main method of the program at startup, the method will have data input and related configuration, through which you can know how many container need;
(Container is a unit of computer resources, according to the client request calculation, the cluster will resolve the calculation job, the calculation results include the required contain resources)
Application to run the main method, know how many shards the parser has, how many shards correspond to container, and then consider other resources, such as shuffle, to allocate some resources.
Summary of 5.MapReduce running on yarn
Master-Slave structure master node, only one : ResourceManager control nodes, each Job all have a Mrappmaster from the node, there are a lot of : Yarnchild ResourceManager responsible for: Receive client-submitted calculation tasks job give mrappmaster execute Monitoring Mrappmaster Status of Implementation Mrappmaster responsible for: responsible for a Job Task Scheduler performed put Job Distribution to Yarnchild Execution Monitoring Yarnchild Status of Implementation Yarnchild responsible for: Execution Mrappmaster Assigned calculation Tasks |
Mrappmaster in 6.Hadoop MapReduce, equivalent to Driver,hadoop in spark The Yarnchildren in MapReduce corresponds to the coarsegrainedexecutorbackend in spark;
(Hadoop has a considerable amount of loss relative to spark resources)
This article is from the "in the Cloud" blog, be sure to keep this source http://ymzhang.blog.51cto.com/3126300/1741452
Big Data Imf-l38-mapreduce Insider decryption Lecture notes and summary