Some Usage problems and precautions Based on the HBase Thrift Interface

Source: Internet
Author: User
Tags unpack

HBase provides Thrift Interface Support for non-Java languages. Here, we will summarize some of the problems encountered and related precautions based on our experience in using HBase Thrift interface (HBase version 0.92.1.
1. Storage sequence of bytes
In HBase, because row (row key and column family, column qualifier, and time stamp) are sorted in lexicographically, for short, int, long, and other data types, bytes. toBytes (...) After the data is converted to a byte array, it must be stored in the large-end mode (the high byte is at the low address, and the low byte is at the high address. The same applies to values. Therefore, when using Thrift API (C ++, Php, Python, etc.), it is best to pack and unpack row and value Based on the big end.
For example, in C ++, int type variables are converted to lexicographically in the following way:
Copy codeThe Code is as follows:
String key;
Int32_t timestamp = 1352563200;
Const char * pTs = (const char *) & timestamp;
Size_t n = sizeof (int32_t );
Key. append (pTs, n );

Convert the Lexicographic Order to int using the following method:
Copy codeThe Code is as follows:
Const char * ts = key. c_str ();
Int32_t timestamp = * (int32_t *) (ts ));

In Php, the pack and unpack methods are provided for conversion:
Copy codeThe Code is as follows:
$ Key = pack ("N", $ num );
$ Num = unpack ("N", $ key );

2. Use of TScan traps
In the PHP Thrift interface of HBase, TScan can directly set attributes such as startRow, stopRow, columns, and filter. By default, these attributes are null, the value changes to non-null after the setting (the TScan constructor or the TScan member variable is assigned a value directly ). When RPC is performed using the write () method and Thrift Server, the result is that these attributes are not null and are transmitted to the Thrift Server through the Thrift protocol.
However, in the Thrift interface of C ++, TScan has a variable of the _ TScan _ isset type. its internal structure is as follows:
Copy codeThe Code is as follows:
Typedef struct _ TScan _ isset {
_ TScan _ isset (): startRow (false), stopRow (false), timestamp (false), columns (false), caching (false), filterString (false ){}
Bool startRow;
Bool stopRow;
Bool timestamp;
Bool columns;
Bool caching;
Bool filterString;
} _ TScan _ isset;

The write () method of TScan is to determine whether attributes such as startRow, stopRow, columns, and filter are set for each bool variable in _ TScan _ isset, determine whether to transmit these attributes to the Thrift Server through the Thrift protocol. These attributes must be set through the _ set_xxx () method to take effect! In the default constructor of TScan, the _ isset tag corresponding to these attributes is not set to true!
Therefore, if you initialize attributes such as startRow, stopRow, columns, and filter through the TScan constructor, the table will be traversed from the beginning. Only _ set_xxx () is called () method To set the corresponding bool ID to true, so that the Server will obtain startRow, stopRow, columns, filter and other attributes from the Thrift Server for scanning.
3. Number of concurrent access threads
First, to minimize the time overhead caused by network transmission, HBase's Thrift Server should be deployed on the same machine as the application client. When the Thrift Server is started, you can configure the number of concurrent threads through parameters. Otherwise, the Thrift Server thread may be full and does not respond to the client's read/write requests. The specific command: bin/hbase-daemon.sh start thrift -- threadpool-m 200-w 500 (for more parameter reference here: bin/hbase-daemon.sh start thrift-h ).
4. Maximum heap memory configuration
If the client and Thrift Server read data in the scan operation sequence and set a certain number of cache records (set through the int32_t caching variable of TScan ), therefore, the number of caching records may occupy a considerable part of the heap memory of the Thrift Server, especially when multiple clients access the database concurrently.
Therefore, you can increase the maximum heap memory before the Thrift Server is started. Otherwise. lang. outOfMemoryError causes the process to be killed, especially when a large number of caching records are set during Scan (export HBASE_HEAPSIZE = 1000 MB by default, can be set in conf/hbase-env.sh ).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.