Summary
In Spark, there are yarn-client and yarn-cluster two modes that can be run on yarn, usually yarn-cluster for production environments, and yarn-cluster for interaction, debug mode, and the following are their differences
Spark-Plug resource management
Spark supports the Yarn,mesos,standalone three cluster deployment patterns, which are common: Master services (Yarn Resourcemanager,mesos Master,spark Standalone) To determine which applications can run and when to run, the slave service (Yarn Nodemanger) runs on each node, the nodes actually run executor processes, and they monitor their state of operation and resource consumption.
advantages of Spark on yarn
1. Spark supports dynamic sharing of resources, and the framework running in yarn shares a centrally configured pool of resources
2. It is convenient to use yarn resource scheduling characteristics to do classification •, isolation and priority control load, with more flexible scheduling strategy
3.Yarn can freely choose Executor quantity
4.Yarn is the only cluster manager that supports spark security, using Yarn,spark to run on kerberized Hadoop and secure authentication between their processes
yarn-cluster VS yarn-client
When in spark on yarn mode, each spark executor as a yarn container is running, while supporting multiple tasks running in the same container, greatly saving the start time of the task
Appliaction Master
To better understand the difference between the two patterns, first understand the application master concept of the next yarn, in yarn, each application has a application master process, which is the first container to start appliaction, It is responsible for requesting resources from ResourceManager, allocating resources, and notifying NodeManager to start Container,application Master for application to avoid the need for an active client to maintain, The client that initiates the Applicatin can exit at any time while the process managed by yarn continues to run in the cluster
Yarn-cluster
In Yarn-cluster mode, driver runs on appliaction master, appliaction the master process is also responsible for driving application and requesting resources from yarn, which runs in yarn container, so the client that initiates application master can immediately shut down without continuing to the application lifecycle, and the following figure is Yarn-cluster mode
Job execution flow in Yarn-cluster mode:
1. Client-generated job information submitted to ResourceManager (RM)
2. RM starts container in one nodemanager (determined by yarn) and assigns application Master (AM) to the NodeManager (NM)
3. NM receives the allocation of RM, initiates application master and initializes the job, at which point this nm is called driver
4. Application to RM for resources, allocating resources and notifying other NodeManager to start the corresponding executor
5. Executor reports to application master on NM and completes the corresponding tasks
Yarn-client
In Yarn-client, application master only requests resources from the yarn to executor, then the client will schedule the job with container communication, and the following figure is Yarn-client mode
Job execution flow in yarn-client mode:
1. Client-generated job information submitted to ResourceManager (RM)
2. RM starts container in local NodeManager and assigns application Master (AM) to the NodeManager (NM)
3. NM receives the allocation of RM, initiates application master and initializes the job, at which point this nm is called driver
4. Application to RM for resources, allocating resources and notifying other NodeManager to start the corresponding executor
5. Executor to the locally initiated application Master Registration report and complete the corresponding task
The following table is a comparison between spark standalone and spark on yarn mode