A jobtracker congestion problem troubleshooting process

Last Update:2015-01-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop version 1.0.3

Problem Description:

As the number of daily Mr Jobs increases, users often block when submitting jobs, which means that Jobtracker has been congested. This situation began to occur frequently, we adjust the number of RPC handler threads on the Jobtracker side, and periodically to the Jobtracker stack information analysis, if the RPC handler thread is all blocked live, dump out the stack information, and timely issued an alarm.

Causes and solutions:

After analyzing the Jobtracker stack information of several times, we find that every time jobtracker congestion, everyone is stealing the lock of Dataqueue object in RPC.

For example: Compute node/client RPC call to Jobtracker

(1)     Org.apache.hadoop.mapred.JobTracker getreducetaskreports (2)     Org.apache.hadoop.mapred.JobTracker.getMapTaskReport (3)     org.apache.hadoop.mapred.JobTracker.heartbeat (4)     org.apache.hadoop.mapred.JobTracker.getJobStatus ....

These methods all need to Jobtracker object locking (is the synchronized method, or in the function physical to this lock, details visible jobtracker code).

Through the Jobtracker thread stack information found, everyone is robbing Jobtracker object lock:

<0x000000050ce06ae8> (Aorg.apache.hadoop.mapred.JobTracker)

While holding the Jobtracker object lock for the IPCHandler14 thread, it blocks the call to the Dfsclient.writechunk method, which needs to write data to the Dataqueue.

Holds the Dataqueue object lock <0x0000000753bd9a48 > (a java.util.LinkedList) for the Datastreamer thread, which is writing the Job.history file to HDFs

almost every time a congestion occurs, Datastreamer The threads are all writing History file, write HDFS is a very slow operation, then, the problem / Will the bottleneck be here?

Randomly viewed the next job.history file from the 50070 interface of HDFs:

Surprised to find that blocksize only 3KB.

So, the problem is, some job.history files are very large (MB level, such as when the user uploads the dependency type jar package, this history file is very Large)

Assuming that the Job.history file has 3m,blocksize 3KB, you need to namenode apply for 1K block, set 1k pipeline to Datanode .... It is no wonder that Datastreamer.writechunk waits for the lock of the Dataqueue object (since the pipeline only transmits 3kB of data at a time).

After viewing the Jobtracker code, it is found that the block size of job.hstory is specified by the parameter mapred.jobtracker.job.history.block.size. This default value is 3M, I do not know which predecessor for what reason to change it to 3KB, re-adjust to 3mb,jobtracker no longer congestion. In addition, we separate the jobhistoryserver from Jobtracker and use a single machine to Jobhistoryserver to relieve the pressure of Jobtracker.

<property> <name>mapred.jobtracker.job.history.block.size</name> <value>3145728</ Value> <description>the block size of the job history file. Since the jobrecovery uses job history, it important to dump job history to disk as soon as possible. Note that this is a expert level parameter.  Thedefault value is set to 3 MB. </description></property>

(PS: Familiar with the hadoop1.x know, jobtracker the resource management and job monitoring together, limiting the jobtracker job throughput capacity, hadoop2.x (YARN) to the Jobtracker function was split: ResourceManager and Applicationmaster, can fundamentally solve the problem of low throughput rate of jobtracker operations)

A jobtracker congestion problem troubleshooting process

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A jobtracker congestion problem troubleshooting process

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A jobtracker congestion problem troubleshooting process

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support