Online Job migration: migrate jobs from GP to Hadoop and find that some jobs have been running for 2-3 minutes to Hadoop for about 10 minutes. This will affect the migration performance; an obvious Query is as follows:
Insertinto table_big partition (dt = today) select xxx from table_hour_incrementala, table_big B where a. id = B. id and B. dt = yesterday;
Check grace:
650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131229/2052532511-0.png "title =" map1.png "alt =" 142236756.png"/>
Obviously, the bottleneck is concentrated on the second MAP. The shuffle time of reduce is executed for 207 seconds and less than seconds. table_big is an external table, check the file and find that it is a gz file of about mb. The reason is clear, and mapred is set for this Job. reduce. task = 8:
The first is to reduce the computing time of each reduce, and the second is to increase the number of MAP files in the today partition so that the effect can be seen tomorrow: P
650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131229/205253G51-1.png "title =" map2.png "alt =" 142310482.png"/>
It can be seen that the computing time of each reduce has been reduced to 30 seconds. At the same time, the files that generate the today partition are also 8 small files of 30 mb, which are ready for the next increase of MAP.
This article is from "MIKE's old blog" blog, please be sure to keep this source http://boylook.blog.51cto.com/7934327/1301072