Hbase error records and modification methods

Source: Internet
Author: User
1. hbase often encounters various problems during running or operation. Most problems can be solved by modifying the configuration file. Of course, you can modify the source code.

When the concurrency of hbase comes up, hbase will frequently encounter the "too open files" (too many open files) problem. The log records are as follows:
16:05:22, 776 info org. Apache. hadoop. HDFS. dfsclient: exception in createblockoutputstream java.net. socketexception: too many open files
16:05:22, 776 info org. Apache. hadoop. HDFS. dfsclient: abandoning block blk_3790131629645188816_18192

16:13:01, 966 warn Org. apache. hadoop. HDFS. dfsclient: DFS read: Java. io. ioexception: cocould not obtain block: BLK _-299035636445664257843 file =/hbase/sendreport/logs/data/17703aa901934b39bd3b2e2d18c671b4. 9a84770c805c78d2ff19ceff6fecb972
At org. Apache. hadoop. HDFS. dfsclient $ dfsinputstream. choosedatanode (dfsclient. Java: 1812)
At org. Apache. hadoop. HDFS. dfsclient $ dfsinputstream. blockseekto (dfsclient. Java: 1638)
At org. Apache. hadoop. HDFS. dfsclient $ dfsinputstream. Read (dfsclient. Java: 1767)
At org. Apache. hadoop. HDFS. dfsclient $ dfsinputstream. Read (dfsclient. Java: 1695)
At java. Io. datainputstream. readboolean (datainputstream. Java: 242)
At org. Apache. hadoop. hbase. Io. Reference. readfields (reference. Java: 116)
At org. Apache. hadoop. hbase. Io. Reference. Read (reference. Java: 149)
At org. Apache. hadoop. hbase. regionserver. storefile. <init> (storefile. Java: 216)
At org. Apache. hadoop. hbase. regionserver. Store. loadstorefiles (store. Java: 282)
At org. Apache. hadoop. hbase. regionserver. Store. <init> (store. Java: 221)
At org. Apache. hadoop. hbase. regionserver. hregion. instantiatehstore (hregion. Java: 2510)
At org. Apache. hadoop. hbase. regionserver. hregion. initialize (hregion. Java: 449)
At org. Apache. hadoop. hbase. regionserver. hregion. openhregion (hregion. Java: 3228)
At org. Apache. hadoop. hbase. regionserver. hregion. openhregion (hregion. Java: 3176)
At org. Apache. hadoop. hbase. regionserver. handler. openregionhandler. openregion (openregionhandler. Java: 331)
At org. Apache. hadoop. hbase. regionserver. handler. openregionhandler. Process (openregionhandler. Java: 107)
At org.apache.hadoop.hbase.exe cutor. eventhandler. Run (eventhandler. Java: 169)
At java. util. Concurrent. threadpoolexecutor. runworker (threadpoolexecutor. Java: 1110)
At java. util. Concurrent. threadpoolexecutor $ worker. Run (threadpoolexecutor. Java: 603)
At java. Lang. thread. Run (thread. Java: 722)

Cause and modification method: Generally, the default value of the maximum number of files that can be opened in Linux is 1024, which can be modified in real time through ulimit-N 65535, but it will be invalid after restart. You can also modify the settings as follows:
There are three ways to modify it:

1. Add a ulimit-shn 65535 row in/etc/rc. Local.
2. Add a ulimit-shn 65535 row in/etc/profile.
3. Add the following two lines of records at the end of/etc/security/limits. conf:
* Soft nofile 65535
* Hard nofile 65535 2. two timeout settings are found during HDFS writing: DFS. socket. timeout and DFS. datanode. socket. write. timeout; in some cases, it is assumed that only the DFS following the modification is required. datanode. socket. write. the timeout item is enough. In fact, the error is read_timeout. The default values in hbase are as follows:

// Timeouts for communicating with datanode for streaming writes/reads

Public static int read_timeout = 60
* 1000; // actually exceeds this value

Public static int read_timeout_extension = 3*1000;

Public static int write_timeout = 8*60*1000;

Public static int write_timeout_extension = 5*1000; // For write pipeline log: 11/10/12 10:50:44 warn HDFS. dfsclient: dfsoutputstream responseprocessor exception for block blk_8540857362443890085_4341_470java.net.sockettimeoutexception: 66000 millis timeout while waiting for Channel to be ready
Read. ch: Java. NIO. channels. socketchannel [connected local =/172. *. *. *: 14707 remote = /*. *. *. 24: 80010] cause and modification method:

So finding out is caused by timeout, so Add the following configuration in the hadoop-site.xml configuration file:

<Property>

<Name> DFS. datanode. Socket. Write. Timeout </Name>

<Value> 3000000 </value>

</Property>

 

<Property>

<Name> DFS. Socket. Timeout </Name>

<Value> 3000000 </value>

</Property>

</Configuration> 3. hadoop reports incompatible
Namespaceids

Workaround 1: Start from scratch

I can testify that the following steps solve this error, but the side effects won't make you happy (Me neither). The crude workaround I have found is:

1. Stop the Cluster

2. Delete the data directory on
The problematic datanode: the directory is specified by DFS. data. dir in CONF/hdfs-site.xml; if you followed this tutorial, the relevant directory is/usr/local/hadoop-datastore/hadoop-hadoop/dfs/Data

3. reformat the namenode (note:
All HDFS data is lost during this process !)

4. Restart the Cluster

When deleting all the HDFS data and starting from scratch does not sound like a good idea (it might be OK during the initial setup/testing), you might give the second approach a try.

Workaround 2: updating namespaceid of problematic datanodes

Big thanks to Jared stehler for the following suggestion. I have not tested it myself yet, but feel free to try it out and send me your feedback. This workaround is "minimally invasive"
You only have to edit one file on the problematic datanodes:

1. Stop the datanode

2. Edit the value of namespaceid
In <DFS. Data. dir>/current/version to match the value of the current namenode

3. Restart the datanode

If you followed the instructions in my tutorials, the full path of the relevant file is/usr/local/hadoop-datastore/hadoop-hadoop/dfs/data/current/version (Background: DFS. data. DIR is
Default set to $ {hadoop. tmp. dir}/dfs/data, and we set hadoop. tmp. dir to/usr/local/hadoop-datastore/hadoop-hadoop ).

If you wonder how the contents of version look like, here's one of mine:

# Contents of <DFS. Data. dir>/current/version

Namespace id = 393514426

Storageid = DS-1706792599-10.10.10.1-50010-1204306713481

Ctime= 1215607609074

Storagetype = data_node

Layoutversion =-13

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.