Llama-impala on Yarn Intermediate Coordination Service

Last Update:2015-07-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is based on Hadoop yarn and Impala under the CDH release

In earlier versions of Impala, in order to use Impala, we typically started the Impala-server, Impala-state-store, and Impala-catalog services in a client/server structure on each cluster node, And the allocation of memory and CPU cannot be dynamically adjusted during the boot process. After CDH5, Impala began to support Impala-on-yarn mode, through an intermediate coordination yarn and Impala called llama (long-lived application Master), to Hadoop Yarn Explorer requests a compute resource.

1, Llama introduction

Llama (Long-livedapplicationmaster) is a service system that is located between Clouderaimpala and Hadoopyarn and is used to coordinate resource management. In a Hadoop cluster, Impala can schedule, use, and release resource allocations through llama to reduce the amount of resource management used when performing Impala queries. The llama service in the cluster really works only if resource management is enabled in Impala.
By default, yarn allocates resources according to the needs of the MapReduce job, while Impala requires that all resources be available at the same time to ensure that the intermediate results of the query can be exchanged between different nodes without delaying the query time to wait for new resource allocations. The llama is meant to ensure that the resources they need are available before each Impala query starts executing.
If a query is executed, llama caches the resource to ensure that it can be used when performing a impala subquery. This caching mechanism avoids the need for new resource requests before each query executes. Also, if yarn needs the resource for other work, Llama will return this part of the resource to yarn.
It is important to note that llama only supports yarn and cannot be used in conjunction with MRV1, while Llama is also collaborating with Hadoop through the yarn's profile.

2, control the calculation of resource estimates

When we submit SQL to Impala, it is sometimes wrong to estimate the compute resources that the query might consume, and Impala supports the user to set the default memory and CPU resource request size, and when the SQL is running, when the resource becomes scarce, Impala will request more resources from llama to expand (expanding) The current reserved resources, and once the query job is completed, llama will usually return the resources to yarn. Users can add the-rm_always_use_defaults parameter (required) and-rm_default_memory=size and-rm_default_cpu_cores (optional) When starting to use the Impalad process , Cloudera official recommends that the use of Impala-on-yarn with these startup parameters allows the query resource to be dynamically extended.

3, verify the calculation of resource estimates and actual use

To make it easy for users to verify the size of the cluster resource used by the query statement, use the EXPLAIN statement to query the information about the associated memory estimate and the size of the virtual core. Using explain does not actually submit a query

4. The principle of resource limitation

CPU limit is through the Linux cgroups mechanism, yarn at each node to start the container process in line with cgroups form

Memory limit is by restricting the query memory of Impala, and once the query request is authorized, Impala will set the memory limit before execution

Llama-impala on Yarn Intermediate Coordination Service

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Llama-impala on Yarn Intermediate Coordination Service

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Llama-impala on Yarn Intermediate Coordination Service

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support