Hbase Client Access timeout reason and parameter optimization __hbase

Source: Internet
Author: User
Tags set time connection reset unique id zookeeper

The default HBase client's parameter configuration is not optimized, so for the HBase cluster with low latency response, the parameters of the client need to be optimized.

1. Hbase.rpc.timeout

All hbase RPC Timeouts in milliseconds, the default is 60s.

This parameter represents the timeout time for an RPC request. If an RPC time exceeds this value, the client will actively close the socket.

If the java.io.IOException:Connection reset by peer anomaly is often present, it is estimated that the HBase cluster has a large number of high concurrent read and write services or a more serious full GC on the server side, Some requests cannot be processed in a timely manner, exceeding the set time interval.

according to the actual situation, can be modified to 5000, that is 5s.

2. Hbase.client.retries.number

Maximum number of client retries. The maximum number of times that all operations are used, such as getting the root zone from the root regionserver, getting the cell's value, and initiating the row update.

The default is 35, which can be set to 3.

3. Hbase.client.pause

Universal Client Pause time value (hibernation time of retry). A frequently used wait time period before a get failure or a zone lookup operation is retried.

HBase1.1 version Start this value defaults to 100ms, more reasonable, if your version is not, it is recommended to modify this value is 100ms.

4. Zookeeper.recovery.retry

Zookeeper the number of retries, can be adjusted to 3 times , zookeeper not easy to hang, and if the HBase cluster problems, each retry will retry the zookeeper.

The total number of retries for zookeeper are:

Hbase.client.retries.number * Zookeeper.recovery.retry.

And each retry of the sleep time will be 2 exponential growth, each access HBase will be retried, in a hbase operation if multiple zookeeper access, if Zookeeper is not available, there will be many times zookeeper retry, very waste of time.

5. Zookeeper.recovery.retry.intervalmill

Zookeeper retry hibernation time, the default is 1s, can be reduced, such as: 200ms.

6. Hbase.client.operation.timeout

This parameter indicates that the HBase client initiates a data operation until the total timeout time between the responses is received, and the data operation types include get, append, increment, delete, put, and so on. Obviously, Hbase.rpc.timeout represents an RPC timeout, while hbase.client.operation.timeout represents a timeout for one operation and may contain multiple RPC requests.

For example, as a put request, the client first encapsulates the request into a caller object that sends an RPC request to the server, assuming that a serious full GC has occurred because the server is properly correct. Cause this RPC time timeout causes sockettimeoutexception, corresponds to is hbase.rpc.timeout. If the caller object sends the RPC request just after the network jitter, and then throws the network exception, the HBase client will retry, retry multiple times, if the total operation time timeout causes sockettimeoutexception, The corresponding is hbase.client.operation.timeout.

The default is 1200000, which can be set to 30000, or 30s.

7. Hbase.regionserver.lease.period

The Hbase.client.operation.timeout parameter stipulation timeout basically involves hbase all data operation, only then does not have the scan operation. However, the scan operation is most likely to occur time out, but also the user's most concerned about. HBase specifically considers this, and provides a separate timeout parameter to set: Hbase.client.scanner.timeout.period.

This parameter refers to the timeout for each interaction with the Regionserver when scan a query.

The default is 60s, not adjusted.

HBase version 1.1 begins, this parameter is renamed to Hbase.client.scanner.timeout.period.

To better understand this parameter, we demonstrate a scan example:

Package com.zy.hbase;

Import java.io.IOException;

Importorg.apache.hadoop.conf.Configuration;

Import org.apache.hadoop.hbase.HBaseConfiguration;

Import Org.apache.hadoop.hbase.KeyValue;

Import Org.apache.hadoop.hbase.TableName;

Importorg.apache.hadoop.hbase.client.Connection;

Importorg.apache.hadoop.hbase.client.ConnectionFactory;

importorg.apache.hadoop.hbase.client.HTable;

Import Org.apache.hadoop.hbase.client.Result;

Importorg.apache.hadoop.hbase.client.ResultScanner;

Import Org.apache.hadoop.hbase.client.Scan;

Import org.apache.hadoop.hbase.util.Bytes;

public class Kylinscan {

/**

* @param args

* @throws IOException

*/

publicstatic void Main (string[] args) throws IOException {

Scan ("Kylin");

}

@SuppressWarnings ("deprecation")

publicstatic void Scan (String tbl) throws IOException {

configurationconf = Hbaseconfiguration.create ();

Conf.set ("Hadoop_home", "d:\\iangshouzhuang\\hadoop-2.6.0");

Conf.set ("Hbase.zookeeper.quorum", "10.20.18.24,10.20.18.25,10.20.18.28");

Conf.set ("Hbase.zookeeper.property.clientPort", "2181");

Conf.set ("Zookeeper.znode.parent", "/hbase114");

Conf.setint ("Hbase.rpc.timeout", 20000);

Conf.setint ("Hbase.client.operation.timeout", 30000);

Conf.setint ("Hbase.client.scanner.timeout.period", 20000);

Stringtablename = TBL;

Tablenametablenameobj = tablename.valueof (tablename);

Connectionconnection = connectionfactory.createconnection (conf);

Htabletable = (htable) connection.gettable (tablenameobj);

Scanscan = new Scan ();

Scan.setmaxresultsize (10000);

scan.setcaching ();

Resultscannerrs = Table.getscanner (scan);

for (result R:rs) {

For (KeyValue Kv:r.raw ()) {

System.out.println (String.Format ("row:%s,family:%s, qualifier:%s, qualifiervalue:%s, timestamp:%s.")

Bytes.tostring (Kv.getrow ()), bytes.tostring (kv.getfamily ()),

Bytes.tostring (Kv.getqualifier ()),

Bytes.tostring (Kv.getvalue ()), Kv.gettimestamp ());

}

}

}

}

The output results are:

row:100001, Family:info, Qualifier:id,qualifiervalue:1, timestamp:1469930920802.

row:100001, Family:info, Qualifier:name,qualifiervalue:hadoop, timestamp:1469930934184.

Many people mistakenly believe that a scan operation is an RPC request, in fact, a scan operation that requests a large amount of data may cause several serious consequences: the server side may cause high IO utilization due to a large number of IO operations, affecting other normal business requests , a large amount of data transmission will lead to the network bandwidth and other system resources are occupied; The client may also cause oom because the memory cannot cache the data. To avoid these problems, hbase splits a large scan operation into multiple RPC requests based on set conditions, returning only the specified number of results at a time. The foreach (Result r:rs) statement in the preceding code is actually equivalent to result r = Rs.next (), and each time a next () operation invokes the client to send an RPC request. The parameter hbase.client.scanner.timeout.period is used to indicate the timeout for such an RPC request, the default is 60000ms, and a Sockettimeoutexception exception is thrown once the request times out.

According to the above description, we introduce two problems to illustrate.

1. A scan operation may be split into several RPC

The number of RPC requests in one scan is mainly related to two factors, one is the number of scan to retrieve, and the other is the number of data bars requested by single RPC, it is obvious that the ratio of the two is the number of RPC requests.

The number of scan to be retrieved at once is determined by the conditions set by the user, such as the user wants to obtain all the operation information of a user for the last one months at a time, the sum of which is 10w, and the total number of scan scan bars is 10w. In order to prevent the amount of data requested by a scan operation to be too large, an additional parameter maxresultsize the total number of results bars, which represents the maximum number of data bars that can be fetched at once scan, the default is-1, which means no limit, if the user sets the parameter, The final number of returned results is the smaller of the value and the actual number of retrieved bars.

The number of data bars requested by the single RPC is set by the parameter caching, which defaults to 100. Because each RPC request gets the data cached to the client, the value, if set too large, may cause the client memory to Oom because of the amount of data that is fetched at a time, and if the setting is too small it can cause a large scan to be too many RPC, and the network cost is high.

2. Regionserver occasionally throws leaseexception in the scan process

See Leaseexception will think of the lease mechanism, indeed, hbase internal in a complete scan operation introduced the lease mechanism. Why the lease mechanism is needed. This is related to the entire scan operation process, as mentioned above, a complete scan is usually split into multiple RPC requests, in the actual implementation, Regionserver receives the first RPC request, will generate a globally unique ID for the scan operation, called Scanid. In addition, Regionserver will do a lot of preparation, build the entire scan system, construct all the objects that need to be used, and subsequent RPC requests only need to carry the same scanid as the mark can directly use these already built resources to retrieve. In other words, in the entire scan process, the client is actually occupied with server-side resources, if this client unexpectedly downtime, whether it means that these resources will never be released. The lease mechanism is to solve this problem. After receiving the first RPC, Regionserver generates a lease that carries a time-out, in addition to the globally unique Scanid. The timeout can be configured through parameter hbase.regionserver.lease.period, once the subsequent RPC request is not coming in the timeout period (such as the client processing is too slow), Regionserver considers the client an exception, and the lease is destroyed and the entire scan is held Of the resources released, the client after processing completed after the subsequent RPC come over, check to the corresponding lease no longer exist, will throw leaseexcption exception.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.