A simple HQL Optimization

Source: Internet
Author: User

Online Job migration: migrate jobs from GP to Hadoop and find that some jobs have been running for 2-3 minutes to Hadoop for about 10 minutes. This will affect the migration performance; an obvious Query is as follows:

Insertinto table_big partition (dt = today) select xxx from table_hour_incrementala, table_big B where a. id = B. id and B. dt = yesterday;

Check grace:

650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131229/2052532511-0.png "title =" map1.png "alt =" 142236756.png"/>


Obviously, the bottleneck is concentrated on the second MAP. The shuffle time of reduce is executed for 207 seconds and less than seconds. table_big is an external table, check the file and find that it is a gz file of about mb. The reason is clear, and mapred is set for this Job. reduce. task = 8:

The first is to reduce the computing time of each reduce, and the second is to increase the number of MAP files in the today partition so that the effect can be seen tomorrow: P

650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131229/205253G51-1.png "title =" map2.png "alt =" 142310482.png"/>

It can be seen that the computing time of each reduce has been reduced to 30 seconds. At the same time, the files that generate the today partition are also 8 small files of 30 mb, which are ready for the next increase of MAP.


This article is from "MIKE's old blog" blog, please be sure to keep this source http://boylook.blog.51cto.com/7934327/1301072

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.