Full parsing of Hadoop data operating system yarn

Source: Internet
Author: User
Tags docker registry spark mllib

"Hadoop 2.0 introduces yarn, greatly improving the resource utilization of the cluster and reducing the cost of cluster management." How is it used in heterogeneous clusters? What are the successful practices of Hulu to share?

To enable unified management and scheduling of resources in the cluster, Hadoop 2.0 introduces data operating system yarn. Yarn Introduction, greatly improve the resource utilization of the cluster, and reduce the cost of cluster management. First, yarn allows multiple applications to run in a cluster and allocate resources to them, which greatly improves resource utilization, and second, yarn allows for a mix of short and long service deployments in a single cluster, and provides support for fault tolerance, resource isolation, and load balancing. This greatly simplifies the deployment and management costs of jobs and services.

Yarn is generally based on the Master/slave architecture, shown in 1, where Master is called Resourcemanager,slave called NodeManager, ResourceManager is responsible for the unified management and scheduling of resources on each nodemanager. When a user submits an application, it is necessary to provide a applicationmaster to track and manage the program, which is responsible for requesting resources from ResourceManager and requiring Nodemanger to initiate container that can occupy a certain resource. Because different applicationmaster are distributed to different nodes and are separated by a certain isolation mechanism, they do not affect each other.

  

Figure 1 The basic architecture of Apache yarn

The resource management and scheduling function in yarn is responsible for the resource scheduler, which is one of the core components of Hadoop yarn and is a pluggable service component in ResourceManager. Yarn organizes and divides resources through hierarchical queues, and provides a variety of multi-tenant resource schedulers that allow administrators to group users or applications according to application requirements, and assign different resource volumes to different groupings, while preventing individual users or applications from monopolizing resources by adding various constraints. To meet a variety of QoS needs, the typical representative is Yahoo! 's capacity Scheduler and Facebook's Fair Scheduler.

Yarn, as a general-purpose data operating system, can run short jobs like MapReduce, spark, or deploy a long service like Web server, MySQL server, and truly implement a cluster multi-purpose, We are often referred to as lightweight elastic computing platform, which is lightweight because yarn uses the Cgroups lightweight isolation scheme, which is elastic because yarn can adjust the resources they occupy according to the load or demand of various computing frameworks or applications, realize cluster resource sharing and resource elasticity contraction.

  

Figure 2 Eco-system with yarn as its core

The application of Hadoop yarn in heterogeneous cluster

Starting with version 2.6.0, yarn introduces a new scheduling strategy: A label-based scheduling mechanism. The main motivation of this mechanism is to better enable yarn to run in heterogeneous clusters, and to better manage and dispatch mixed-type applications.

1. What is label-based scheduling

Therefore, the Name Incredibles, label-based scheduling is a scheduling strategy, like priority-based scheduling, is one of the many scheduling strategies in the scheduler, can be mixed with other scheduling policies. The basic idea of this strategy is that the user can label each nodemanager, such as Highmem,highdisk, as the basic attribute of NodeManager, and the user can set several tags for the queue in the scheduler. To restrict the queue to occupy only the node resources that contain the corresponding labels, so that jobs submitted to a queue can only run on a specific node. By tagging, users can divide hadoop into subsets of clusters, allowing users to run applications to nodes that meet certain characteristics, such as running memory-intensive applications such as spark to large memory nodes.

2.Hulu Application Case

The label-based scheduling strategy is widely used within Hulu. This mechanism is enabled primarily for the following three considerations:

The cluster is heterogeneous. In the evolution of the Hadoop cluster, the new machine is usually better configured than the old one, which makes the cluster eventually become a heterogeneous cluster. Many design mechanisms at the beginning of the Hadoop design assume that clusters are isomorphic, and even now, Hadoop's support for heterogeneous clusters is still imperfect, such as the mapreduce hypothesis that the execution mechanism has not yet taken into account the heterogeneous cluster scenario.

Applications are diverse. Hulu deploys many types of applications, such as MapReduce, Spark, spark streaming, Docker service, on top of the yarn cluster. When running multi-class applications in heterogeneous clusters, it is very detrimental to the efficient execution of distributed programs because of the large difference in the time of completion of parallel tasks due to different machine configurations. In addition, because yarn cannot be fully resource-isolated, multiple applications run on one node and are prone to interfering with each other, and applications with low latency types are often intolerable.

Individual machine requirements. Because of the dependency on a particular environment, some applications can only run on specific nodes in a large cluster. The typical representative is that spark and Docker,spark Mllib may use some native libraries, which are typically installed on only a few nodes in order to prevent contamination of the system, and Docker container's operation relies on Docker engine to streamline operational costs , we will only let Docker run on a number of specified nodes.

In order to solve these problems, Hulu enabled the label-based scheduling strategy based on capacity scheduler. As shown in 3, we have multiple tags for the nodes in the cluster based on machine configuration and application requirements, including:

Q Spark-node: The machines used to run spark jobs, which are usually high-profile, especially memory-heavy;

Q Mr-node: Machines running MapReduce operations, these machine configurations are diverse;

Q Docker-node: The machine running the Docker application, which is equipped with a Docker engine;

Q Streaming-node: The machine that runs the spark streaming streaming application.

  

Figure 3 Yarn Deployment Example

It is important to note that yarn allows a single node to have multiple tags at the same time, thus enabling a single machine to run multi-class applications (within Hulu, we allow some nodes to be shared while running multiple applications). On the surface, the cluster is divided into multiple physical clusters by the introduction of tags, but in fact, these physical clusters are different from the traditionally completely isolated clusters, which are both independent and interrelated, and users can easily adjust the purpose of a node dynamically by modifying the tag.

Hadoop Yarn Application case and experience summary

1Hadoop yarn Application Case Hadoop yarn, as a data operating system, provides a rich API for users to develop applications. Hulu has done a lot of exploration and practice in yarn application design, and has developed several distributed computing frameworks and computing engines that can run directly on yarn, typically represented by Voidbox and Nesto.

(1) Docker-based container computing framework Voidbox

Docker is a very popular container virtualization technology for nearly two years, automating the deployment of most applications, enabling any program to run in a resource-isolated container environment, providing a more elegant solution for building, publishing, and running projects.

To integrate yarn and Docker's unique strengths, the Hulu Beijing Big Data team has developed voidbox. Voidbox is a distributed computing framework that uses yarn as a resource management module, using Docker as the engine for performing tasks, allowing yarn to dispatch both traditional mapreduce and spark-type applications. You can also schedule applications that are encapsulated in a Docker image.

Voidbox supports Docker container-based DAG (directed acyclic graph) tasks and long service (such as Web Service), providing a variety of application submissions, such as command-line and IDE-style, to meet the needs of production and development environments. In addition, Voidbox can work with Jenkins,gitlab, a private Docker warehouse to complete a comprehensive set of development, testing, and automated publishing processes.

  

Figure 4 Voidbox System architecture

In Voidbox, yarn is responsible for resource scheduling of the cluster, and Docker acts as an execution engine that pulls the mirror from the Docker registry. Voidbox is responsible for requesting resources for the container-based DAG task and running the Docker task. As shown in 4, each black box represents a machine with several modules running on it, as follows:

Voidbox components:

Voidboxclient: Client program. This component allows users to manage Voidbox applications (Voidbox applications contain one or more Docker jobs, one job contains one or more Docker tasks), such as committing and killing voidbox applications.

Voidboxmaster: Actually a yarn application Master, who is responsible for applying resources to yarn and allocating the resulting resources to internal Docker tasks.

Voidboxdriver: A Task Scheduler that is responsible for a single voidbox application. Voidbox supports Docker container-based DAG task scheduling and can insert additional user code between tasks, Voidbox driver is responsible for handling the scheduling of dependencies between DAG tasks and running user code.

Voidboxproxy: A bridge between yarn and the Docker engine that relays yarn to the Docker engine, such as starting or killing Docker containers.

StateServer: Maintains health information for each Docker engine, provides Voidbox master with a list of machines that can run Docker container, enabling Voidbox Master to request resources more efficiently.

Docker components:

Dockerregistry: Storage docker image as the version management tool for internal Docker images.

Dockerengine:docker container executes the engine, obtains the corresponding Docker image from Docker registry, executes Docker-related commands.

Jenkins: With Gitlab for application versioning, when the app version is updated, Jenkins is responsible for compiling the package, generating the Docker image, uploading it to the Docker Registry, and completing the application auto-release process.

Similar to spark on Yarn,voidbox also offers two application run modes, namely Yarn-cluster mode and Yarn-client mode. The control and resource management components of an application in Yarn-cluster mode run in the cluster, and after the Voidbox application is submitted successfully, the client can exit at any time without affecting the running of the application in the cluster. Yarn-cluster mode is suitable for the production environment submission application, the control component of the application in Yarn-client mode runs on the client, other components run in the cluster, the client can see more information about the application running state, after the client exits, Applications running in the cluster also exit, and the yarn-client mode makes debugging easier for users.

(2) Parallel computing engine Nesto

Nesto is a presto/impala-like MPP computing engine in Hulu that is designed to handle complex nested data, supports complex data processing logic (SQL is difficult to express), uses Columnstore, code Generation and other optimization techniques to speed up data processing efficiency. The Nesto architecture is similar to Presto/impala, which is not centralized, and multiple Nesto servers perform service discovery through zookeeper.

To simplify Nesto deployment and management costs, Hulu deploys nesto directly to yarn. In this way, the Nesto installation deployment process will be very simple: the Nesto installer (including configuration files and jar packages) is placed into a separate compressed package into HDFs, the user can run a commit command, and specify the number of Nesto servers to start, A set of nesto clusters can be quickly deployed with information such as the resources required for each server.

The Nesto on Yarn program consists of a applicationmaster and multiple executor, where applicationmaster is responsible for the application of resources like yarn, and initiates executor, while the executor function is to start Nesto Server, the key design point in Applicationmaster, its features include:

To communicate with ResourceManager, request resources, these resources need to be guaranteed from different nodes to reach each node only to start a executor purpose;

Communicate with NodeManager, start executor, and monitor these executor health conditions, and once a executor fails, restart a new executor on the other nodes;

Provides an embedded Web server to showcase the health of tasks in each Nesto server.

2.Hadoop Yarn Development Experience Summary

(1) Skillfully use resources to apply for API

Hadoop yarn provides a rich resource expression semantics that allows users to request resources on a specific node/rack, or in a blacklist to no longer accept resources on a node.

(2) Note memory overhead

A container of memory is composed of Java HEAP,JVM overhead and Non-java memory, if the user sets the size of the application to x GB (-XMXXG), The container memory size for which Applicationmaster is requested should be x+d, where D is the JVM overhead, otherwise it may be killed by yarn due to the total memory exceeding limit.

(3) Log rotation

Log rotation is especially important for long service logs, which accumulate more and more. Since the application is unable to know where the log is located (for example, which directory of the node) before it is started, yarn provides a macro for the user to manipulate the log directory, and yarn will automatically replace it with a specific log directory when the macro appears in the startup command, such as:

echo $log 4jcontent > $PWD/log4j.properties && java-dlog4j.configuration=log4j.properties ...

Com.example.NestoServer 1>>/server.log 2>>/server.log

Where the variable log4jcontent content is as follows:

  

(4) Debugging skills

Before NodeManager starts container, the container-related environment variables, startup commands, and other information are written to a shell script and the container is started by starting the script. In some cases, the container startup failure may be due to a startup command being written incorrectly (for example, some special characters are escaped), and to do this, you can determine whether the startup command is problematic by looking at the last script execution, by adding a command to print the contents of the script before container executes the command.

  

(5) Performance issues associated with shared clusters

When running multiple applications at the same time in a yarn cluster, the node load may be different, resulting in tasks on some nodes running slower than the other nodes, which is unacceptable for OLAP requirements applications. In order to solve this problem, there are usually two solutions: 1) by tagging the application to some exclusive nodes 2) implement a speculative execution mechanism similar to mapreduce and spark within the application, for the slow task to start one or more of the same tasks, in a space-time-changing way, Avoid slow tasks and slow down the overall application running efficiency.

Hadoop Yarn Development trends

For YARN, it will evolve toward universal resource management and scheduling, not just in big data processing areas, including support for MapReduce, Spark short jobs, and support for long service such as Web service.

Full parsing of Hadoop data operating system yarn

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.