Spark Yarn-cluster and Yarn-client

Source: Internet
Author: User

Summary

In Spark, there are yarn-client and yarn-cluster two modes that can be run on yarn, usually yarn-cluster for production environments, and yarn-cluster for interaction, debug mode, and the following are their differences

 Spark-Plug resource management

Spark supports the Yarn,mesos,standalone three cluster deployment patterns, which are common: Master services (Yarn Resourcemanager,mesos Master,spark Standalone) To determine which applications can run and when to run, the slave service (Yarn Nodemanger) runs on each node, the nodes actually run executor processes, and they monitor their state of operation and resource consumption.

  advantages of Spark on yarn

1. Spark supports dynamic sharing of resources, and the framework running in yarn shares a centrally configured pool of resources

2. It is convenient to use yarn resource scheduling characteristics to do classification •, isolation and priority control load, with more flexible scheduling strategy

3.Yarn can freely choose Executor quantity

4.Yarn is the only cluster manager that supports spark security, using Yarn,spark to run on kerberized Hadoop and secure authentication between their processes

 yarn-cluster VS yarn-client

When in spark on yarn mode, each spark executor as a yarn container is running, while supporting multiple tasks running in the same container, greatly saving the start time of the task

Appliaction Master

To better understand the difference between the two patterns, first understand the application master concept of the next yarn, in yarn, each application has a application master process, which is the first container to start appliaction, It is responsible for requesting resources from ResourceManager, allocating resources, and notifying NodeManager to start Container,application Master for application to avoid the need for an active client to maintain, The client that initiates the Applicatin can exit at any time while the process managed by yarn continues to run in the cluster

Yarn-cluster

In Yarn-cluster mode, driver runs on appliaction master, appliaction the master process is also responsible for driving application and requesting resources from yarn, which runs in yarn container, so the client that initiates application master can immediately shut down without continuing to the application lifecycle, and the following figure is Yarn-cluster mode

Job execution flow in Yarn-cluster mode:

1. Client-generated job information submitted to ResourceManager (RM)

2. RM starts container in one nodemanager (determined by yarn) and assigns application Master (AM) to the NodeManager (NM)

3. NM receives the allocation of RM, initiates application master and initializes the job, at which point this nm is called driver

4. Application to RM for resources, allocating resources and notifying other NodeManager to start the corresponding executor

5. Executor reports to application master on NM and completes the corresponding tasks

Yarn-client

In Yarn-client, application master only requests resources from the yarn to executor, then the client will schedule the job with container communication, and the following figure is Yarn-client mode

Job execution flow in yarn-client mode:

1. Client-generated job information submitted to ResourceManager (RM)

2. RM starts container in local NodeManager and assigns application Master (AM) to the NodeManager (NM)

3. NM receives the allocation of RM, initiates application master and initializes the job, at which point this nm is called driver

4. Application to RM for resources, allocating resources and notifying other NodeManager to start the corresponding executor

5. Executor to the locally initiated application Master Registration report and complete the corresponding task

The following table is a comparison between spark standalone and spark on yarn mode

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.