Adjust the inputsplit size in Hive to increase the number of maps.

Source: Internet
Author: User

Http://boylook.itpub.net/post/43144/531420


An error occurred while cleaning the Job of HiveCDH4.2.0 online. view the log and find a map oome:

650) this. width = 650; "src =" http://blog.itpub.net//resserver.php? BlogId=43144&resource=hive1.png & mode = medium "border =" 0 "alt =" resserver. php? BlogId = 43144 & resource = hive "/>

Check the log. The HQL table is joined by two tables. splits = 2 means that two maps are enabled for processing respectively, and a large table is 123 MB <dfs. block. size = 128 MB), around rows, it should be because the data volume exceeds the memory of MAP. By comparing the logs of the previous day, we can confirm that:

650) this. width = 650; "src =" http://blog.itpub.net//resserver.php? Blogid=43144&resource=hive=20yesterday.png & mode = medium "border =" 0 "alt =" resserver. php? BlogId = 43144 & resource = hive "/>

Because it is a temporary table, mapred. reduce. tasks = 20 is set to re-run to generate a temporary table. Join cleaning is successful:

650) this. width = 650; "src =" http://blog.itpub.net//resserver.php? BlogId=43144&resource=hive2.png & mode = medium "border =" 0 "alt =" resserver. php? BlogId = 43144 & resource = hive "/>

Because the inputsplit size of MR is min {minsplitsize, max {maxsplitsize, blocksize}, you can set mapred. max. split. size = 32 MB to solve this problem with multiple maps. After trying, we found that there were still two maps. After Google, there was no result, offering a killer: Check the source code;

ComputeSplitSize (LongGoalSize,LongMinSize,LongBlockSize ){

ReturnMath.Max(MinSize, Math.Min(GoalSize, blockSize ));}

This parameter is useless at all. Let's look at the code of CDH3u1:

ComputeSplitSize (long blockSize, long minSize, long maxSize ){

Return Math. max (minSize, Math. min (maxSize, blockSize ));}

This parameter is valid in CDH3 and seems to be a Bug in CDH regression. Later, by setting mapred. map. tasks, the expected number of maps was increased!


This article is from "MIKE's old blog" blog, please be sure to keep this source http://boylook.blog.51cto.com/7934327/1298637

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.