Some usage problems and related precautions in HBase Thrift Interface _php Tutorial

Source: Internet
Author: User
Tags unpack
HBase provides thrift interface support for non-Java languages, combining the use of the HBase Thrift Interface (HBase version 0.92.1) to summarize some of the issues encountered and their related considerations.
1. Order of byte storage
In HBase, because row (row key and column family, column qualifier, time stamp) are sorted by dictionary order, for data of type short, int, long, and so on, by Bytes.tobytes (...) After converting to a byte array, it must be stored in the big-endian mode (high-byte at low address, low-byte at high address). The same is true for value. Therefore, when using the thrift API (c + +, PHP, Python, and so on), it is best to use both row and value for both pack and unpack processing in the big endian.
For example, in C + +, for an int type variable, it is converted to a dictionary order in the following way:
Copy CodeThe code is as follows:
String key;
int32_t timestamp = 1352563200;
Const char* PTs = (const char*) &timestamp;
size_t n = sizeof (int32_t);
Key.append (PTs, N);

Convert the dictionary order to an int by the following method:
Copy CodeThe code is as follows:
const char * ts = KEY.C_STR ();
int32_t timestamp = * ((int32_t*) (TS));

In PHP, the pack and unpack methods are provided for conversion:
Copy CodeThe code is as follows:
$key = Pack ("N", $num);
$num = Unpack ("N", $key);

2. Use traps for Tscan
In the PHP thrift interface of HBase, Tscan can be set directly by setting properties such as StartRow, Stoprow, columns, and filter, and these properties are all null by default. Set to non-null (by Tscan the constructor or by directly assigning a member variable to the Tscan). When RPC operations are performed through the Write () method and thrift server, the direct judgment is that these properties are not null and are transferred to the thrift server side through the thrift Protocol.
However, in the thrift interface of C + +, there is a variable with a _tscan__isset __isset type in Tscan with the following internal structure:
Copy CodeThe code is as follows:
typedef struct _TSCAN__ISSET {
_tscan__isset (): StartRow (False), Stoprow (false), timestamp (false), columns (false), caching (false), filterstring ( False) {}
BOOL StartRow;
BOOL Stoprow;
BOOL timestamp;
BOOL columns;
BOOL caching;
BOOL filterstring;
} _tscan__isset;

The write () method of Tscan is to determine whether the StartRow, Stoprow, columns, and filter properties are set by the _tscan__isset of each bool variable. Decide whether to transfer these properties through the thrift protocol to the thrift server side, which must be set through the __set_xxx () method to take effect! In the default constructor for Tscan, the __isset tag corresponding to these properties is not set to true!
Therefore, if the properties such as StartRow, Stoprow, columns, and filter are initialized directly through the Tscan constructor, the table will be traversed from the beginning, and only the __set_xxx () method is called to set the corresponding bool identity to true. This allows the server to scan for properties such as StartRow, Stoprow, columns, and filter from thrift servers.
3. Number of concurrent access threads
First, in order to minimize the time overhead due to network transport, HBase's thrift server is best deployed on the same machine as the application client. Thrift server startup can be configured by the number of concurrent threads, it is easy to cause the Thrift server thread is full of non-response to client read and write requests, the specific command: bin/hbase-daemon.sh start Thrift-- Threadpool-m 200-w 500 (for more parameter reference here: bin/hbase-daemon.sh start thrift-h).
4. Maximum heap memory configuration
If the client reads the data sequentially from the thrift server and sets a certain number of cache records (via the Tscan int32_t caching variable setting), then these caching records may be consumed thrift A significant portion of the server's heap memory, especially when multiple clients are concurrently accessing it.
Therefore, you can increase the maximum heap memory before the thrift server starts, Otherwise, the process may be killed due to an java.lang.OutOfMemoryError exception, especially if a large number of caching records is set when scan is on (the default is Export HBASE_HEAPSIZE=1000MB, Can be set in conf/hbase-env.sh).

http://www.bkjia.com/PHPjc/327254.html www.bkjia.com true http://www.bkjia.com/PHPjc/327254.html techarticle HBase provides thrift interface support for non-Java languages, combining the use of the HBase Thrift Interface (HBase version 0.92.1) to summarize some of the issues encountered and their related considerations ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.