XV: Yarn and MRV1 Comparison introduction

Source: Internet
Author: User

The main problems of MRV1 are: at runtime, Jobtracker is responsible for both resource management and task scheduling, which leads to its expansibility and low resource utilization. The problem is related to its original design, such as:

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/78/39/wKioL1Z4OtTDDVXGAABQR2uPSWg265.png "title=" 1.png " alt= "Wkiol1z4ottddvxgaabqr2upswg265.png"/>

As can be seen, the MRV1 is carried out around the mapreduce, and there is not much consideration for other data processing methods that appear later. According to the design ideas, each of us to develop a data processing method (such as Spark), we have to implement the corresponding cluster resource management and data processing. As a result, yarn is naturally developed.

Yarn's biggest improvement to MRV1 is to separate resource management from task scheduling, so that various data processing methods can share resource management, such as

As shown:

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M00/78/39/wKioL1Z4O1fSajM1AABwPLJIlFQ541.png "title=" 2.png " alt= "Wkiol1z4o1fsajm1aabwpljilfq541.png"/>

As we can see,yarn is a kind of unified resource management method, which is separated from the Jobtracker in MRv1. The benefits are obvious: resource sharing, scalability, and so on.

The main difference between MRV1 and yarn: In MRv1, Jobtracker is responsible for resource management and operation control, while yarn, Jobtracker is divided into two parts: ResourceManager (RM) and Applicationmaster (AM). As shown in the following:

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M01/78/3A/wKiom1Z4O4-QuZeXAAD5FsLmI1I196.png "title=" 3.png " alt= "Wkiom1z4o4-quzexaad5fslmi1i196.png"/>

From this, we can clearly see that the MRv1 both in resource management and task scheduling are jobtracker to complete . This led to the jobtracker load being too large to manage and expand . For Yarn, we can see clearly that resource management and task scheduling are divided into two parts: RM and am.

The effects of yarn and MRv1 on programming: We know thatMRV1 consists mainly of three parts: the programming model (API), the Data processing engine (Maptask and Reducetask), and the operating environment (Jobtracker and Tasktracker) ; Yarn inherits the programming model and data processing of MRV1, and changes only the running environment, so it has no effect on programming.

To better illustrate yarn's resource management, first look at yarn's framework, as shown in:

650) this.width=650; "src=" Http://s4.51cto.com/wyfs02/M02/78/39/wKioL1Z4POHyNdo7AALRPiQGBnc399.png "title=" 4.png " alt= "Wkiol1z4pohyndo7aalrpiqgbnc399.png"/>

As you can see, when a customer submits an assignment to RM, the AM is responsible for requesting the resource from RM and proposing a task execution to Namemanager (NM). This means that in this process,RM is responsible for resource scheduling, AM responsible for task scheduling . Important NOTE: RM is responsible for the resource management and scheduling of the whole cluster; NodeManager (NM) is responsible for the resource management and scheduling of a single node; nm timed to communicate with RM in the form of a heartbeat, reporting the health status and memory usage of the node; AM by interacting with RM to get resources, Then, by interacting with NM, the compute task starts.


The above contents are explained in detail by the memory resource configuration:

The memory resource configuration of RM is mainly carried out by the following two parameters (these two values are yarn platform features, which should be configured in Yarn-sit.xml):

Yarn.scheduler.minimum-allocation-mb

Yarn.scheduler.maximum-allocation-mb

Description: A single container can request the minimum and maximum memory, the application can not exceed the maximum value when running the application memory, less than the minimum value is assigned the minimum value, from this point of view, the minimum is a bit like the page in the operating system. There is another use of the minimum value to calculate the maximum number of container for a node note: These two values cannot be changed dynamically once set (the dynamic change described here refers to the application runtime).


NM of memory resource configuration, mainly through the following two parameters (these two values are yarn platform features, should be configured in Yarn-sit.xml):

Yarn.nodemanager.resource.memory-mb

Yarn.nodemanager.vmem-pmem-ratio

Description: The maximum memory available for each node, and two values in RM should not exceed this value. This number can be used to calculate the maximum number of container, which is: divide this value by the minimum container memory in RM. The virtual memory rate, which is the percentage of memory used by the task, is 2.1 times times the default; Note: The first parameter is not modifiable, once set, the entire operation is not dynamically modified, and the default size of the value is 8G, even if the computer memory is less than 8G will be used in 8G memory.


Am memory configuration related parameters, described in MapReduce as an example (these two values are the AM attribute, should be configured in Mapred-site.xml), as follows:

Mapreduce.map.memory.mb

Mapreduce.reduce.memory.mb

Description: These two parameters specify the memory size of the two tasks (Map and Reduce Task) used for MapReduce, and their values should be between the maximum minimum container in RM. If not configured, it is obtained by the following simple formula:

Max (Min_container_size, (total Available RAM)/containers)

The general reduce should be twice times the map. Note: These two values can be changed by the parameter when the application is started;


Other memory-related parameters in am, as well as JVM-related parameters, can be configured with the following options:

Mapreduce.map.java.opts

Mapreduce.reduce.java.opts

Note: These two parameters are mainly for the need to run the JVM program (Java, Scala, etc.) prepared, through these two settings can pass parameters to the JVM, memory-related is,-XMX,-XMS and other options. This numeric size should be between MAP.MB and REDUCE.MB in AM.


-------------------------------------------------------------------------------------------

The following is a specific error instance for memory-related instructions, with the following error:

CONTAINER[PID=41884,CONTAINERID=CONTAINER_1405950053048_0016_01_000284] is running beyond virtual memory limits. Current usage:314.6 MB of 2.9 GB physical memory used; 8.7 GB of 6.2 GB virtual memory used. Killing container.

configuration as follows:

    <property>        <name> Yarn.nodemanager.resource.memory-mb</name>        <value> 100000</value>    </property>         <property>        <name> Yarn.scheduler.maximum-allocation-mb</name>        <value> 10000</value>    </property>         <property>        <name> Yarn.scheduler.minimum-allocation-mb</name>        <value> 3000</value>    </property>       < Property>        <name>mapreduce.reduce.memory.mb</name>        <value>2000</value>     </property>

by configuration We see that the container's minimum and maximum memory are: 3000m and 10000m, while the default value of the reduce setting is less than 2000m,map, so two values are 3000m, which means "2.9 GB physical in log" memory used ". Because the default virtual memory rate (that is, 2.1 times times) is used, the total virtual memory for the map task and the reduce task is 3000*2.1=6.2g. And the application of virtual memory exceeded this value, so error. Workaround : Adjust the memory size when starting yarn to adjust the virtual memory rate or when the application is running.

-------------------------------------------------------------------------------------------


in the frame management of yarn, whether am is to request resources from RM or NM to manage the resources of its own node, it is done through container. Container is the resource abstraction of yarn, where resources include memory and cups. The following is a more detailed introduction to container. In order to be people to container have a comparative image of the understanding, first look:

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M01/78/39/wKioL1Z4QBCBRz_RAAFCKU10og8024.png "title=" 5.png " alt= "Wkiol1z4qbcbrz_raafcku10og8024.png"/>

From which we can see, the first AM through the request package Resourcerequest from RM to request resources, when the resource is acquired, am to encapsulate it, encapsulated into a Containerlaunchcontext object, through this object, am and nm to communicate, to start the task.


This article from "in order to finger that direction" blog, declined reprint!

XV: Yarn and MRV1 Comparison introduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.