HADOOP resource/storage isolation

Source: Internet
Author: User
Tags hdfs dfs
1. Resource isolation 1.1. Status Quo a. MinResources and MaxResources are set for each queue. When the queue is idle, other queues can compete for resources from the queue, breaking through the minimum number of resources of the queue. Busy queues can break through the maximum number of resources. At this time, the free queue submits many jobs at the same time, and the resources are insufficient. The preemption queue takes a certain amount of time.

1. Resource isolation 1.1. Status Quo a. Set 'min resources' and 'max resources' for each queue. When the queue is idle, other queues can compete for Resources from the queue, the minimum number of resources that break through the queue. Busy queues can break through the maximum number of resources. At this time, the free queue submits many jobs at the same time, and the resources are insufficient. The preemption queue takes a certain amount of time.

1. Resource isolation 1.1. Current situation

A. Set 'min resources' and 'max resources' for each queue. When the queue is idle, other queues can compete for Resources from the queue, exceeding the minimum number of Resources of the queue. Busy queues can break through the maximum number of resources. At this time, the idle queue submits many jobs at the same time, and there are not enough resources. If the preemption queue does not release resources within a certain period of time, it will force the kill job to release resources and return them to the idle queue.

B. Set mapreduce. job. queuename = 'Resource multi-queue 'to submit data across queues.

1.2. Solution

1.2.1 prohibit cross-queue task submission, that is, block the 'mapreduce. job. queuename' parameter.

1.2.2 steps for modifying the configuration file

A. Modify the file fair_scheduler.xml in Add the following parameters

Dd001 --- dd001 is userdd001

Description: aclSubmitApps: A list of Linux users or user groups that can submit applications to the queue. The default value is "*", indicating that any user can submit applications to the queue.

It should be noted that this attribute is inherited, that is, the list of sub-queues inherits the list of parent queues. When this attribute is configured, users and user groups are separated by commas (,). Users and user groups are separated by spaces, for example, "user1, user2 group1, group2 ".

AclAdministerApps: List of administrators for this queue. A queue administrator can manage resources and applications in the queue, such as killing any application.

2. Disable cross-queue kill job2.1. Current situation

A. The value of yarn. admin. acl is '*', indicating that all users can kill jobs of other users.

2.2. Solution
2.2.1 cross-queue kill jobs are prohibited. Ensure that only super users can kill their own queue jobs. 2.2.2 Configuration Modification steps

A. Add the following parameters to the mapred_site.xml file:

mapreduce.cluster.acls.enabledtrue

B. Add the following parameters to the yarn-site.xml File

yarn.acl.enabletrueyarn.admin.aclhadp

C. core-site.xml file injection parameters ----- prevent front-end appcluser UI access permission problems

hadoop.http.staticuser.userhadp
3. Storage isolation 3.1. Status quo

A. Different users can only have write permission on the directories under their own users, but the directory size is not limited. It may lead to endless writing by some users, while others have no space to write.

3.2. Solution

3.2.1 configure the user's directory size based on the business size.

A. file attributes with no quota set

[dd001[@test_12123](/user/test_12123) ~]$hdfs dfs -count -q hdfs://ns1/user/dd001/warehouse/test_lhnone inf none inf 1 0 0 hdfs://ns1/user/dd_edw/warehouse/test_lh

File quota available space directory quantity total file size file/directory name

B. Set the quota command

[dd001[@test_12123](/user/test_12123) ~]$hdfs dfsadmin -setSpaceQuota 400 hdfs://ns1/user/dd001/warehouse/test_lh

C. Set the attribute value after the quota

[dd001[@test_12123](/user/test_12123) ~]$hdfs dfs -count -q hdfs://ns1/user/dd001/warehouse/test_lhnone inf 400 400 1 0 0 hdfs://ns1/user/dd_edw/warehouse/test_lh

D. What are the results after the test directory quota is exceeded?

[Dd001 [@ test_12123] (/user/test_12123) ~] $ Hdfs dfs-cp hdfs: // ns1/user/dd001/warehouse/export 26_0.lzo hdfs: // ns1/user/dd001/warehouse/test_lh14/10/04 17:54:14 WARN hdfs. DFSClient: DataStreamer Exceptionorg. apache. hadoop. hdfs. protocol. DSQuotaExceededException: The DiskSpace quota of/user/dd_edw/warehouse/test_lh is exceeded: quota = 400 B = 400 B but diskspace consumed = 402653184 B = 384 MBat org. apache. hadoop. hdfs. server. namenode. INodeDir EctoryWithQuota. verifyQuota (INodeDirectoryWithQuota. java: 191) at org. apache. hadoop. hdfs. server. namenode. FSDirectory. verifyQuota (FSDirectory. java: 2054) at org. apache. hadoop. hdfs. server. namenode. FSDirectory. updateCount (FSDirectory. java: 1789) at org. apache. hadoop. hdfs. server. namenode. FSDirectory. updateCount (FSDirectory. java: 1764) at org. apache. hadoop. hdfs. server. namenode. FSDirectory. addBlock (FSDirectory. Java: 357) at org. apache. hadoop. hdfs. server. namenode. FSNamesystem. saveAllocatedBlock (FSNamesystem. java: 2847) at org. apache. hadoop. hdfs. server. namenode. FSNamesystem. getAdditionalBlock (FSNamesystem. java: 2508) at org. apache. hadoop. an error is reported when the hdcp file is larger than the quota.

E. Quota deletion command

[dd001[@test_12123](/user/test_12123) ~]$hdfs dfsadmin -clrSpaceQuota hdfs://ns1/user/dd001/warehouse/test_lh

3.3 Monitoring

Adding a quota is just a command. Limiting storage is not an aim, but a means. The final goal is to make full use of resources to prevent exceeding the quota, rather than reporting task errors. Therefore, monitoring is the primary task.

3.3.1. Resource allocation

Queue Name User Machine count total machine quota (T) Total Cluster machine allocation average quota = (total quota/total Cluster machine allocation) (T) Hard Disk reserved value (T) actual quota = (average quota-Hard Disk reserved value) * Number of machines
Dd001 20 21 20 20.9715 0.0488 418.454
A. Average quota = total quota/total number of Cluster machine assignments.

Actual quota = (average quota-Hard Disk reserved value) * Number of machines.

B. Alarm value = actual quota * 0.8.

3.3.2 post-processing of disk alarms

A. Delete redundant data.

B. Add machines.

When adding machines, the memory and cpu also need to be adjusted.

401 mb,19vcores401 mb,19vcores

The two parameters are adjusted accordingly, and the Quota Adjustment Command is as follows:

A. hdfs dfsadmin-clrSpaceQuota hdfs: // ns1/user/dd001/warehouse/test_lh --- Delete the quota

B. hdfs dfsadmin-setSpaceQuota 'actual quota 'hdfs: // ns1/user/dd001/warehouse/test_lh --- Add a new quota.

C. Increase the number of machines

C.1 average daily growth of directory storage usage = sum (daily growth)/count (1)

C.2. Number of machines = (available disk storage days * Average daily increase in directory storage usage)/(average quota-Hard Disk reserved value)

C.3. Example:

Suppose 'average daily increase of directory storage usage '= 0.5 TB

Number of machines = (90*0.5)/(18.4279-0.0488) = 3

Reference:

Http://blog.csdn.net/caizhongda/article/details/7468363

Http://dongxicheng.org/mapreduce-nextgen/hadoop-yarn-configurations-fair-scheduler/

Http://www.yufan-liu.com/blog? P = 291

Http://blog.itpub.net/122978/viewspace-1119883/

Http://www.07net01.com/zhishi/520762.html

Http://f.dataguru.cn/thread-103012-1-1.html

Original article address: HADOOP resource/storage isolation, thanks to the original author for sharing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.