Author: Liu Xuhui Raymond
Reprinted please indicate the source
Email: colorant at163.com
Blog: http://blog.csdn.net/colorant/
Recently, the Mr program was run on 1 + 4 nodes on the stable version of hadoop1.0.4.
Job. If you encounter latency problems, record share.
In hadoop 1.0.4, jjobtracker's default minimum heartbeat
The interval is 3 seconds, and
Tasktracker only reports the task completion status and requests for new tasks in the heartbeat package by default.
This setting prevents the jobtracker in a large cluster from handling task scheduling too late. however, in a small cluster, the latency of task scheduling is relatively high. Therefore, for jobs with a small data volume and many maptasks, the overall overhead is very large.
In my 1 + 4 node test cluster, A 480region hbase table is scanned, with 24 maps per machine.
Task. A total of five batches are required.
Complete such an MR
The job takes about 64 seconds. It can be considered as the overhead of the MR framework.
To speed up scheduling, you can set the following parameters in the mapred-site.xml:
<Property>
<Name> mapreduce. tasktracker. outofband. Heartbeat </Name>
<Value> true </value>
</Property>
<Property>
<Name> mapreduce. tasktracker. outofband. Heartbeat. Damper </Name>
<Value> 5 </value>
</Property>
Basically, it is faster to report to jobtracker when the task is allowed to complete. The greater the dampper value, the greater the acceleration coefficient. However, there seems to be a bug. If it is not set, it is 100000 by default. Even if it is idle, tasktracker has a CPU usage of more than 80%.
In the preceding 480 regions example, the completion time of the entire job is reduced to about 48 seconds.
However, since the minimum heartbeat of jobtracker is 3 s, the fastest task still needs to be completed 3 s.
In hadoop 1.1.1
In a nearby version, change the minimum heartbeat value of jobtracker to 300 milliseconds, which improves the scheduling latency of small tasks.
Similarly, in the above 480 regions example,
With hadoop 1.1.1, the job completion time is reduced to about 30 seconds. Now the minimum task can be completed within 0.3 seconds.