Hadoop O & M jobtracker stops service for no reason

Source: Internet
Author: User

This afternoon, when my colleague submitted a query using hive, an execution error was thrown:

Open the jobtracker Management page and find that the number of Running jobs is zero, and the tasktracker heartbeat is normal, this exception makes me think that jobtracker may stop the service (generally, the number of jobs running in the cluster is very small), So I manually submitted a mapred task for testing, the running error message is as follows:

12/07/03 18:07:22 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException12/07/03 18:07:22 INFO hdfs.DFSClient: Abandoning block blk_-1772232086636991458_567162812/07/03 18:07:28 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException12/07/03 18:07:28 INFO hdfs.DFSClient: Abandoning block blk_-2108024038073283869_567162912/07/03 18:07:34 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 192.168.1.25:5001012/07/03 18:07:34 INFO hdfs.DFSClient: Abandoning block blk_-6674020380591432013_567162912/07/03 18:07:40 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 192.168.1.26:5001012/07/03 18:07:40 INFO hdfs.DFSClient: Abandoning block blk_-3788726859662311832_567162912/07/03 18:07:46 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block.        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3002)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)12/07/03 18:07:46 WARN hdfs.DFSClient: Error Recovery for block blk_-3788726859662311832_5671629 bad datanode[2] nodes == null12/07/03 18:07:46 WARN hdfs.DFSClient: Could not get block locations. Source file "/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201206270914_17301/job.jar" - Aborting...

From the namenode log, we found that the file block BLK _-2108024041773283869_5671629 is the task jar package for jobtracker:

2012-07-03 18:07:27,316 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201206270914_17301/job.jar. blk_-2108024038073283869_5671629

Go to the corresponding datanode to view the log and find that the file block is not available. The problem arises: jobtracker applied for the storage resources configured by mapred job to namenode, namenode correctly allocates resources (datanode list), and then jobtracker contacts datanode again to report an error, but datanode is still working normally (with a running data loading Service), then, why does jobtracker fail to write data to datanode?

Then carefully check the log on datanode when the problem occurs and find such a log information:

2012-07-03 18:07:10,274 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.1.25:50010, storageID=DS-841642307-50010-1324273874581, infoPort=50075, ipcPort=50020):DataXceiverjava.io.IOException: xceiverCount 257 exceeds the limit of concurrent xcievers 256

Baidu's error message: xceivercount 257 exceeds the limit of concurrent xcievers 256 indicates that the error is reported mainly because of the configuration items:

<property>        <name>dfs.datanode.max.xcievers</name>        <value>256</value></property>

DFS. datanode. Max. xcievers is similar to the file handle limit in Linux. When the number of connections in datanode is configured, datanode rejects the connection.

All right, the problem is found. You only need to find the opportunity to modify the configuration of all the datanode nodes in the cluster, and change the DFS. datanode. Max. xcievers parameter to a greater value.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.