[Switch] Application of the Huawei hbase index module: hbase secondary index module: hindex research October 16, 2014

Source: Internet
Author: User

Article Source: http://www.batchfile.cn /? P = 63

Hbase secondary index module: hindex Research

Hindx is a secondary index solution of hbase. It provides declarative indexes for hbase. It uses a coprocessor to automatically create and maintain index tables. The client does not need to perform dual-write on data. In addition, hindex uses some clever rowkey orchestration methods to make the index data and actual data distributed in the same region, achieving high query performance. Introduction: Huawei-hbase-secondary-index-implementations code: https://github.com/?wei-hadoop/hindexsource code Introduction

Hindex is implemented based on hbase 0.94.8 and runs on the hbase server. It adopts a coprocessor to maintain and query index tables:

  • Org. Apache. hadoop. hbase. Index. coprocessor. master. indexmasterobserver
    Intercept DDL operations. create, modify, and delete index tables synchronously when the database tables are created/deleted/enable/disable/drop. It also intercepts the region balance process and simultaneously modifies the index table when hfile is merged or split. This ensures that the record of the index table and the data record of the unified rowkey are always in the same region server, accelerating query efficiency.

  • Org. Apache. hadoop. hbase. Index. coprocessor. regionserver. indexregionobserver
    Blocks put/delete/get/scan/flush operations on database tables and synchronously updates data in index tables.

  • Org. apache. hadoop. hbase. index. coprocessor. wal. indexwalobserver is used to synchronize Wal operations. When an operation is performed in the pre-write area of the region server, it determines whether the index table needs to be synchronized and submits the pre-write operation to the region server.

The source code contains the complete implementation of the hbase-0.94.8, and the Code related to the secondary index are in/Secondaryindex/src/main/JavaDirectory.

Currently, index table synchronization, index table balance, index synchronization, and index scan have been implemented. The functions to be implemented in the release notes are as follows:

  • Dynamically Add/drop index
  • Integrate secondary index management in the hbase Shell
  • Optimize range scan scenarios
  • Hbck tool support for secondary index tables
  • Wal optimizations for secondary index table entries
  • Make scan evaluation intelligence pluggable
Instructions for use

Download the source code and use Maven to compile it:

mvn package -DskipTests=true 

Upload the compilation product to the hbase Server:

scp target/hbase-0.94.8.jar [email protected]:$HBASE_HOME/conf/ 

Hbase configuration (hbase-env.sh ):

export HBASE_CLASSPATH=$HBASE_HOME/conf/hbase-0.94.8.jar 

Hbase configuration (hbase-site.xml ):

<property>
<name>hbase.coprocessor.master.classes</name>
<value>org.apache.hadoop.hbase.index.coprocessor.master.IndexMasterObserver</value>
</property>
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.index.coprocessor.regionserver.IndexRegionObserver</value>
</property>
<property>
<name>hbase.coprocessor.wal.classes</name>
<value>org.apache.hadoop.hbase.index.coprocessor.wal.IndexWALObserver</value>
</property>

Restart hbase and access:

  • Http: // hbase-server: 60010/master-status
  • Http: // hbase-server: 60030/RS-status

You can see that the coprocessor has been successfully installed:

Coprocessors [indexmasterobserver]
Coprocessors [indexregionobserver, indexwalobserver]

Call the Java API of hbase and hindex to create a table and create an index on the table:

 IndexedHTableDescriptor htd = new IndexedHTableDescriptor(usertableName);
IndexSpecification iSpec = new IndexSpecification(indexName);
HColumnDescriptor hcd = new HColumnDescriptor(columnFamily);
iSpec.addIndexColumn(hcd, indexColumnQualifier, ValueType.String, 10);
htd.addFamily(hcd);
htd.addIndex(iSpec);
admin.createTable(htd);

As expected, data tables and index tables should appear at the backend. when data is inserted into the data table, reverse indexes will automatically appear in the index table according to the index definition. But there is no such phenomenon, why?

Version compatibility

The reason is that hindex is incompatible with the hbase version on site. Hindex is developed based on the hbase-0.94.8 version, but the hbase-0.94.6-cdh4.3.0 is used in the field, is developed by cloudera Based on the hbase-0.94.6, these two versions of coprocessor interface is inconsistent. Taking masterobserver as an example, blocking the create table operation in the hbase-0.94.8 is as follows:

  • Postcreatetable (observercontext <mastercoprocessorenvironment>, htabledescriptor, hregioninfo [])
  • Postcreatetablehandler (observercontext <mastercoprocessorenvironment>, htabledescriptor, hregioninfo [])
  • Precreatetable (observercontext <mastercoprocessorenvironment>, htabledescriptor, hregioninfo [])
  • Precreatetablehandler (observercontext <mastercoprocessorenvironment>, htabledescriptor, hregioninfo [])

In the hbase-0.94.6 is as follows:

  • Postcreatetable (observercontext <mastercoprocessorenvironment>, htabledescriptor, hregioninfo [])
  • Precreatetable (observercontext <mastercoprocessorenvironment>, htabledescriptor, hregioninfo [])

Therefore, the coprocessor implemented in hindex cannot create an index table according to the reservation process. Tracking hbase mastar logs, only the logs that call precreatetable and postcreatetable are not executed in both precreatetablehandler and postcreatetablehandler.

Try to modify the indexmasterobserver method, according to the interface Name of the hbase-0.94.6 implementation, redeployment, create a data table and index, and found that the index table is automatically created successfully:

User table online regions description
Test 1 {name => 'test', families => [{name => 'info'}]}
Test_idx 1 {name => 'test _ idx', split_policy => 'org. apache. hadoop. hbase. regionserver. constantsizeregionsplitpolicy ', max_filesize => '000000', families => [{name => 'D'}]}

However, there is only an empty index table, and the index data has not yet entered the index table. You need to modify indexregionobserver and indexwalobserver again, but also to handle the hfile split and merge, in short to the secondary index function fully ported to the hbase-0.94.6-cdh4.3.0. Forget it.

Check whether CDH has any hbase-0.94.8-based version. CDH has been developed to 5.0.0-beta-1 based on hbase-0.95.2 development:
CDH-version-and-packaging-information.

It seems that both cdh4.5.0 and cdh5.0.0-beta1 are not directly based on the hbase-0.94.8. If you want to use the hindex function, you need to perform porting and testing. Of course you can also choose to use Apache native hbase-0.94.8, but this version is not based on hadoop 2 Deployment version, only based on hadoop 1 deployment.

When the query requirements are determined, you can pre-plan the index structure. I believe that hindex is a good solution and it is worth doing. If you have time, continue the study.

 

[Switch] Application of the Huawei hbase index module: hbase secondary index module: hindex research October 16, 2014

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.