When multiple users operate on HDFS and hbase, the following exception occurs, which means they cannot connect to datanode and cannot obtain data.
INFO hdfs.DFSClient: Could not obtain block blk_-3181406624357578636_19200 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...13/07/23 09:06:39 WARN hdfs.DFSClient: Failed to connect to /192.168.3.4:50010, add to deadNodes and continuejava.net.SocketException:
2. Use hadoop fsck/to check the HDFS file. The result is healthy, indicating that the node data is correct. The namenode and datanode should be consistent.
3. view the datanode log
Dataxceiver is found to be a problem. The value of dataxceiver is greater than 4096, so reading and writing cannot be provided. This value was previously changed to 4096. Now it is found that this value is too small.
In the configuration file, change:
<property> <name>dfs.datanode.max.xcievers</name> <value>12288</value> </property>
4. the problem persists. Then, the following error is reported in the datanode log.
2012-06-18 17:47:13 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode DatanodeRegistration(x.x.x.x:50010, storageID=DS-196671195-10.10.120.67-50010-1334328338972, infoPort=50075, ipcPort=50020):DataXceiverjava.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected
Https://issues.apache.org/jira/browse/HDFS-3555 this post said that is a client problem, resulting in datanode can not write data to the client, re-check the code, because the file is large, found that every time the data is read is not close the file
5. Restart the cluster and find that the meta table cannot be loaded when hbase is started.
org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region is not online: .META.,,1
Http://www.zihou.me/html/2013/06/27/8673.html
Follow this post to solve the problem. After the restart, there is no problem with hbse.
When reading data in multiple threads after 6, region still cannot provide services after a period of time. This is certainly a problem with datanode, but dataxceiver has been changed to a large one and continues to check the code, the filesystem instance is obtained every time data is read, and it is not closed. After the change, the problem is solved.
Conclusion: In fact, this problem has nothing to do with the cluster. It should be okay if dataxceiver is set to 4096.
You have two problems.
First, the file is not closed every time you read the file.
Second, do not obtain the filesystem instance multiple times.