International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Configuration items of hadoop1

Last Update:2014-09-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Mapred. Min. Split. Size

The meaning is the same as that on the word surface. After a long time, it was found that the task was initiated on the machine, rather than the host, which needed to be configured ..

Mapred. Map. Tasks

The total number of map tasks in a job is thought to be the total number of files/the actual splitsize. I don't know what to use this .. However, the following example should illustrate some problems:

Several parameter configurations of the hive production environment used by my company are as follows:
DFS. Block. size = 268435456
Hive. Merge. mapredfiles = true
Hive. Merge. mapfiles = true
Hive. Merge. Size. Per. Task = 256000000
Mapred. Map. Tasks = 2

Because the default value of "true" is used to merge small files, and the combination of "DFS. Block. Size" and "hive. Merge. Size. Per. Task" makes most of the merged files about MB.

Case 1:

Now suppose we have 3 900 MB files, then goalsize = min (2,256 MB/256 MB) = Mb (see the http://blog.sina.com.cn/s/blog_6ff05a2c010178qd.html for details on how to calculate the number of maps)
Therefore, the entire job has 6 maps, three of which process MB of data respectively, and three of which process 44 MB of data respectively.
At this time, the barrel effect comes. The execution time of the whole job's map stage is not the execution time of the shortest one map, but the execution time of the longest one map. Therefore, although three maps process only 44 MB of data, they can run quickly, but they still need to wait for the other three maps to process MB. Obviously, processing MB of three maps slows down the entire job.

Case 2:

If we set mapred. Map. Tasks to 6, let's take a look at the changes:
Goalsize = min (900 MB/6,256 MB) = 150 MB
The entire job will also be allocated with 6 maps for processing. Each map processes MB of data, which is very even. No one will hold back and allocate resources rationally, the execution time is about 59% (150/256) of Case 1)

Case Analysis:

Although mapred. Map. tasks has been adjusted from 2 to 6, Case 2 does not use more map resources than Case 1, and both use six maps. The execution time of Case 2 is about 59% of the execution time of Case 1.
From this case, we can see that the automatic optimization settings for mapred. Map. Tasks can significantly improve the job execution efficiency.

Ref: http://blog.sina.com.cn/s/blog_6ff05a2c0101aqvv.html

PS: many times I cannot find the official website configuration instructions of hadoop1. First, add it to my favorites. I hope this address will not change any more:

Https://hadoop.apache.org/docs/r1.0.4/mapred-default.html

This article from the "Dream footprints" blog, please be sure to keep this source http://daisy8867.blog.51cto.com/1043582/1554424

Configuration items of hadoop1

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Configuration items of hadoop1

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support