Yarn application development and design process

Source: Internet
Author: User

From the business point of view, an application needs to be developed in two parts, one is to access yarn platform, to achieve 3 protocols, through yarn to achieve access to cluster resources, and the implementation of business functions, which is not much related to yarn itself. Here is how to connect an application to the yarn platform.

The yarn development process includes the development of client and Applicationmaster.

Yarn mainly consists of ResourceManager and NodeManager, ResourceManager is responsible for the management and distribution of resources, NodeManager is responsible for the isolation of specific resources. Yarn, the resource is encapsulated using a container. When you develop your application on yarn, you need to implement the following three modules:

    • Application client: Application clients are used to submit the application to yarn, so that the application runs on yarn, while monitoring the running state of the application and controlling the operation of the application;

    • Application Master:am is responsible for the operation control of the whole application, including registering the application to yarn, applying the resources, starting the container, etc., the actual work of the application is carried out in the container;

    • Application worker: The actual work of the application, not all applications need to write Worker. NodeManager starts the container that am sent over, and the container internally encapsulates the resources and startup commands that the application worker needs to run.

Implementing the above modules involves the following 3 RPC protocols:

    • The agreement between APPLICATIONCLIENTPROTOCOL:CLIENT-RM, which is mainly used for the submission of applications;

    • The agreement between APPLICATIONMASTERPROTOCOL:AM-RM, AM to register with RM and apply for resources through this Agreement;

    • The protocol between the CONTAINERMANAGEMENTPROTOCOL:AM-NM, AM through the Protocol control NM boot container.

Part I: Development of the client " the primary role is to submit (deploy) applications and monitor applications to run two parts "

Client design Process (4 steps)

Step 1:client Get the unique application ID from ResourceManager through the RPC function applicationclientprotocol# getnewapplication.

Submitapplication "All information is encapsulated in this parameter" commits the Applicationmaster to ResourceManager.
Populate certain applications with information such as how many resources the Applicationmaster needs, what Applicationmaster runs the jar package, what the startup command is, and so on.

Step 3:RM start am based on the contents of the Applicationsubmissioncontext package.
Step 4: The client gets the running state of the app through AM or RM and controls the app's running process.
    

The second part: the preparation of applicationmaster in two steps, each step is divided into three steps:
first, AM and RM interaction "mainly to apply for resources"
1, applicationmaster through the RPC function applicationmasterprotocol#Registerapplicationmaster to ResourceManager registration;
"When registering will tell ResourceManager own IP port"
"When registration is complete, some information is returned: for example, how much resources can you get from this applicationmaster, what is your token,"

2, applicationmaster through the RPC function applicationmasterprotocol#allocate application of resources to ResourceManager (in the form of container);
"Allocate is an RPC function that, after Applicationmaster is started, already knows how many tasks are inside, how many resources each task requires, and summarizes the request for resources through allocate to ResourceManager."
"Allocate request resources, will periodically call the ALLOCATE function, first: Heartbeat, tell ResourceManager I'm alive, second: Allocate every call ResourceManager will return some information to you
Let's say you have a new application for some resources. 】
"You need constant probing, there's no new resources to get."
"At the same time, if some of the tasks die, ResourceManager will tell you through allocate."
"If you continue to apply for resources, after applying for resources, Applicationmaster will communicate with Nodemanageer to initiate the corresponding task"

3. Applicationmaster via RPC functionapplicationmasterprotocol#Finishapplicationmaster tells the ResourceManager application to complete and exits.
"Continue to apply for resources, constantly start tasks, and finally all of the tasks are running out."
"This time ResourceManager will erase the Applicationmaster information from the memory."
Second, AM and NM interaction "The main is to start the container, query the container state, stop the container, the following first and second steps are executed simultaneously"
1,Applicationmaster will apply to the resources two times assigned to internal tasks , and through the RPC function containermanagementprotocol# Startcontainer communicates with the corresponding NodeManager to start the container ("by the way tell NodeManager, this container,,,, that contains the task description, resource description, etc.)
"For example, got 1 cpu,1g memory, there are 10 tasks, in the end assigned to which task, there is a certain scheduling strategy, this also should be implemented by you, for example, designed to be arbitrarily assigned to a task, or have a local task"

2, applicationmaster through the RPC function containermanagementprotocol#g etcontainerstatus to NodeManager ask container running state, Once a container run fails "There is applicationmaster discovery, not ResourceManager discovery", Applicationmaster
You can try to request resources for the corresponding task again

3, once a container run is complete, Applicationmaster can be released via RPC function containermanagementprotocol# Stopcontainer container

Suppose the client and Appmaster have been developed:
The first step is to transfer the dependent jar packets to the HDFs
The second step commits the job to ResourceManager
Step three: ResourceManager receive the job and start the applicationmaster you wrote, executed by your main function, communication ResourceManager request resources, Request to the resource and then communicate with NodeManager to start the task.

always, yarn is a resource management platform and does not involve business logic, and the specific business logic needs to be implemented by the user. Yarn's core role is to allocate resources and ensure resource isolation.

Yarn application development and design process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.