Llama-impala on Yarn Intermediate Coordination Service

Source: Internet
Author: User

This article is based on Hadoop yarn and Impala under the CDH release


In earlier versions of Impala, in order to use Impala, we typically started the Impala-server, Impala-state-store, and Impala-catalog services in a client/server structure on each cluster node, And the allocation of memory and CPU cannot be dynamically adjusted during the boot process. After CDH5, Impala began to support Impala-on-yarn mode, through an intermediate coordination yarn and Impala called llama (long-lived application Master), to Hadoop Yarn Explorer requests a compute resource.


1, Llama introduction

Llama (Long-livedapplicationmaster) is a service system that is located between Clouderaimpala and Hadoopyarn and is used to coordinate resource management. In a Hadoop cluster, Impala can schedule, use, and release resource allocations through llama to reduce the amount of resource management used when performing Impala queries. The llama service in the cluster really works only if resource management is enabled in Impala.
By default, yarn allocates resources according to the needs of the MapReduce job, while Impala requires that all resources be available at the same time to ensure that the intermediate results of the query can be exchanged between different nodes without delaying the query time to wait for new resource allocations. The llama is meant to ensure that the resources they need are available before each Impala query starts executing.
If a query is executed, llama caches the resource to ensure that it can be used when performing a impala subquery. This caching mechanism avoids the need for new resource requests before each query executes. Also, if yarn needs the resource for other work, Llama will return this part of the resource to yarn.
It is important to note that llama only supports yarn and cannot be used in conjunction with MRV1, while Llama is also collaborating with Hadoop through the yarn's profile.


2, control the calculation of resource estimates

When we submit SQL to Impala, it is sometimes wrong to estimate the compute resources that the query might consume, and Impala supports the user to set the default memory and CPU resource request size, and when the SQL is running, when the resource becomes scarce, Impala will request more resources from llama to expand (expanding) The current reserved resources, and once the query job is completed, llama will usually return the resources to yarn. Users can add the-rm_always_use_defaults parameter (required) and-rm_default_memory=size and-rm_default_cpu_cores (optional) When starting to use the Impalad process , Cloudera official recommends that the use of Impala-on-yarn with these startup parameters allows the query resource to be dynamically extended.


3, verify the calculation of resource estimates and actual use

To make it easy for users to verify the size of the cluster resource used by the query statement, use the EXPLAIN statement to query the information about the associated memory estimate and the size of the virtual core. Using explain does not actually submit a query


4. The principle of resource limitation

CPU limit is through the Linux cgroups mechanism, yarn at each node to start the container process in line with cgroups form

Memory limit is by restricting the query memory of Impala, and once the query request is authorized, Impala will set the memory limit before execution

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Llama-impala on Yarn Intermediate Coordination Service

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.