There is a classic Hadoop MapReduce next generation–writing yarn applications in yarn's official documentation, which tells you how to write an application based on Hadoop 2.0 yarn (Chinese translation). This article mainly describes the Yarn program implementation process and how to develop a little idea.
Original address: http://www.rigongyizu.com/how-to-write-hadoop-0-23-yarn-mapreduce-and-other-applications/ yarn Program execution Flow
Yarn is a resource management system that is responsible for the management and distribution of the entire cluster resources. If you need to run the program on a yarn cluster: First, you have to have a client to submit the job to the ResourceManager (RM) request resource. The client communicates with RM via the RMPROTOCOL protocol, using some of the information needed to run the application, such as local file/jars, commands executed, parameters, The first container that is provided to RM to run the application, such as environment variables, is Applicationmaster (Appmaster). If a resource is requested, RM starts appmaster on the first container. Appmaster then communicates with ResourceManager through the AMRMPROTOCOL protocol, registers itself, and then continues to apply for resources. If the containers,appmaster is obtained, it communicates with NodeManager through the Containermanager class and initiates container for the task. Appmaster also provides some of the information needed to start container, such as command line, environment variables, and so on. After the task is completed, Appmaster will notify the RM task to complete by Amrmprotocol::finishapplicationmaster. At the same time, the client can obtain the status information of the job by querying RM, or it can query the information directly from Appmaster if appmaster support. If necessary, the client can kill application by Clientrmprotocol::forcekillapplication.
The entire execution process can refer to the following diagram (Source network):
The three role client, the clients, is responsible for submitting the application to RM. Appmaster is the core of the entire application, responsible for communicating with RM, requesting resources, starting containers. and monitor the implementation of containers, after container execution failure to do failover processing. Container is the specific work, and specific business related to some of the processing logic. Three RPC Protocols Clientrmprotocol (Client<–>resourcemanager): Client-to-RM communication protocol that can initiate appmater, query, or kill Appmaster. Amrmprotocol (Applicationmaster<–>resourcemanager): Appmaster communicates with RM, Appmaster can register and log off to RM, You can also request a resource from RM to start container. Containermanager (applicationmaster<–> NodeManager): Appmaster and NM communication, you can start or stop a container, you can also get container execution status. Distributed Shell
Detailed steps for writing yarn applications can be directly referenced in the example of the distributed shell that comes with the source code. Distributed shell is the execution of a shell command or a script on each node, which is helpful for understanding the basic concepts. development of yarn programming framework
As you can see, a yarn application is written, and a lot of work is written on the client and Appmaster. and Appmaster to deal with resource applications, start and monitor container, especially container's fail over, which is really worth the attention of the place. For a large number of applications, Appmaster may work the same way, which can abstract a common appmaster framework. Users of the framework only need to care about their specific application logic is container, can greatly reduce development costs.
In fact yarn has provided a direct use of the client-mrclienservice and Appmaster-mrappmater. MapReduce is also just a generic framework on yarn, so it's entirely possible to refer to Mrappmaster to implement your own framework. Like storm's streaming computing framework, or the framework for dispatching RPC service, or a framework that supports MPI. There are already similar projects on GitHub, and it is believed that some common frameworks will soon emerge. related articles August 20, 2013 [translate] the next generation of Hadoop mapreduce– how to write Yarn applications August 11, 2013 [translate] Next generation Apache Hadoop mapreduce:yarn July 2012 1 0 days Hadoop mapreduce 2.0–yarn Learning installation configuration and operation details November 13, 2013 mr/rm/nm state transitions from Graphviz generation Hadoop 2.0 Yarn October 12, 2013 MapReduce J OB allows a file to be processed by only one map September 23, 2013 Hadoop reads different formats in a MapReduce job with Multipleinputs/multiinputformat