Some problems of using HBase Thrift interface and details of relevant notices _php skills

Source: Internet
Author: User
Tags pack unpack
HBase provides thrift interface support for non-Java languages, combining the experience of HBase Thrift Interface (HBase version) to summarize some of the problems encountered and the related considerations.
1. Storage Order of bytes
In HBase, because row (row key and column family, column qualifier, and time stamp) are sorted in dictionary order, for data of type short, int, long, and so on, Through Bytes.tobytes (...) After converting to a byte array, it must be stored in the big-endian mode (high byte at low address, low byte at high address). The same applies to value. Therefore, in the use of the thrift API (c + +, PHP, Python, etc.), it is best for the row and value are unified according to the big end of the pack and unpack processing.
For example, in C + +, for an int variable, it is converted to a dictionary order in the following ways:
Copy Code code as follows:

String key;
int32_t timestamp = 1352563200;
Const char* PTs = (const char*) &timestamp;
size_t n = sizeof (int32_t);
Key.append (PTs, N);

The dictionary order is converted to int in the following ways:
Copy Code code as follows:

const char * ts = KEY.C_STR ();
int32_t timestamp = * (int32_t*) (TS);

The pack and unpack methods are provided in PHP for conversion:
Copy Code code as follows:

$key = Pack ("N", $num);
$num = Unpack ("N", $key);

2. The use of Tscan traps
In the HBase PHP thrift interface, Tscan can directly set properties such as StartRow, Stoprow, columns, filter, and so on, by default, these properties are null, is set to a non-null (either through the Tscan constructor or directly to the Tscan member variable). When RPC operations are made through the write () method and the thrift server, the direct judgment is based on that these properties are not null and are transmitted through the Thrift Protocol to the thrift server side.
However, in the thrift interface of C + +, Tscan has a variable in the _tscan__isset __isset type, and its internal structure is as follows:
Copy Code code as follows:

typedef struct _TSCAN__ISSET {
_tscan__isset (): StartRow (False), Stoprow (false), timestamp (false), columns (false), caching (false), filterstring ( False) {}
BOOL StartRow;
BOOL Stoprow;
BOOL timestamp;
BOOL columns;
BOOL caching;
BOOL filterstring;
} _tscan__isset;

Tscan's Write () method is to determine whether the StartRow, Stoprow, columns, filter, and other attributes are set for each bool variable mark under _tscan__isset. Determines whether these properties are transferred through the Thrift Protocol to the thrift server side, which must be set by the __set_xxx () method to take effect! In the default constructor for Tscan, the __isset tag corresponding to these properties is not set to true!
Therefore, if properties such as StartRow, Stoprow, columns, and filter are initialized directly through the Tscan constructor, the table is traversed from the beginning, and only the __set_xxx () method is invoked to set the corresponding bool identity to true. This allows the server to obtain StartRow, Stoprow, columns, filter, and so on from the thrift server for scanning.
3. Number of concurrent access threads
First, to minimize the time overhead associated with network traffic, the HBase thrift Server is best deployed on the same machine as the application client. Thrift Server can configure the number of concurrent threads through parameters when it starts, it is easy to cause Thrift server threads to be full of not responding to the client's read and write requests, specific commands: bin/hbase-daemon.sh start Thrift- Threadpool-m 200-w 500 (more parameter reference here: bin/hbase-daemon.sh start thrift-h).
4. Maximum heap memory configuration
If the client and thrift server are scan in order to read the data and set a certain number of cache records (through Tscan int32_t caching variable settings), then the number of caching records may occupy thrift A significant portion of the server's heap memory, especially when multiple-client concurrent access is available.
Therefore, you can increase the maximum heap memory before thrift server starts, Otherwise, the process may be killed due to java.lang.OutOfMemoryError exceptions, especially when the scan is set to a large number of caching record bars (default is Export HBASE_HEAPSIZE=1000MB, Can be set in conf/hbase-env.sh).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.