HBase Two-level index _

HBase Two-level index __hbase

Last Update:2018-07-29 Source: Internet

Author: User

Tags hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Overview

HBase is a Key-value database based on Hadoop, which provides efficient random read and write services on HDFS data, perfectly fills the pitfalls of Hadoop MapReduce only for batch processing and is being used by more and more users. As an important feature of HBase, Coprocessor was added to the HBase 0.92 version and was widely popular

With the coprocessor, the user can write code that runs on the HBase server side. HBase supports two types of coprocessor, Endpoint and Observer. The Endpoint coprocessor is similar to a stored procedure in a traditional database where clients can invoke these Endpoint coprocessor to execute a section of server-side code and return the results of the server-side code to the client for further processing, the most common use being aggregation operations. Without a coprocessor, when the user needs to find the maximum data in a table, the Max aggregation operation, a full table scan is required, traversing the scan results within the client code, and performing the maximum operation. Such a method cannot take advantage of the concurrency capability of the underlying cluster, and it is inefficient to centralize all computations to the client side for unified execution. With coprocessor, the user can deploy the maximum code to the HBase server, HBase will take advantage of multiple nodes at the bottom cluster to perform the maximum operation concurrently. That is, the maximum code is executed within each Region range, and the maximum value of each Region is computed on the Region Server side, and only the max value is returned to the client. The maximum value is found in further processing of the maximum number of Region by the client. This will improve the overall efficiency of the implementation of a lot.

Another coprocessor is called Observer coprocessor, a coprocessor similar to a trigger in a traditional database, which is invoked by the Server side when certain events occur. Observer coprocessor is a hook hook that is scattered around the HBase Server-side code and is invoked when a fixed event occurs. For example, there is a hook function preput before the put operation, which is called by Region Server before the put operation is executed, and a postput hook function after the put operation. development Environment maven-3.3.9 JDK 1.7 cdh-hbase-1.2.0 myeclipse hbase coprocessor loading

Enter HBase command line

# HBase Shell

HBase (Main):> disable ' test '    

hbase (main):> alter ' test ', CONFIGURATION => {' Hbase.table.sanity.checks ' = > ' false '}         //-----Create a table, execute once on the line

hbase (main):> alter ' test ', ' coprocessor ' => ' hdfs:///code/jars/ regionobserver-put5.jar|com.hbase.observer.app|1001 '   //----load jar package

hbase (main):> alter ' test ', method = > ' Table_att_unset ', NAME => ' coprocessor$1 '  //--------Uninstall Jar Pack

hbase (main):> desc ' test '    //-- -----"View the table's property Description

hbase (main):> enable ' test '

Complete Engineering Code

Package com.hbase.observer;
/** * HBase Level Two index * @author wing * @createTime 2017-4-7/import java.io.IOException;
Import java.util.ArrayList;

Import java.util.List;
Import Org.apache.hadoop.hbase.Cell;
Import org.apache.hadoop.hbase.CoprocessorEnvironment;
Import org.apache.hadoop.hbase.client.Durability;
Import Org.apache.hadoop.hbase.client.Get;
Import Org.apache.hadoop.hbase.client.HTableInterface;
Import Org.apache.hadoop.hbase.client.HTablePool;
Import Org.apache.hadoop.hbase.client.Put;
Import Org.apache.hadoop.hbase.client.Result;
Import Org.apache.hadoop.hbase.coprocessor.BaseRegionObserver;
Import Org.apache.hadoop.hbase.coprocessor.ObserverContext;
Import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
Import Org.apache.hadoop.hbase.regionserver.InternalScanner;
Import Org.apache.hadoop.hbase.regionserver.wal.WALEdit;

Import org.apache.hadoop.hbase.util.Bytes;

    public class App extends Baseregionobserver {private Htablepool pool = null; PriVate final static String source_table = "Test"; @Override public void Start (Coprocessorenvironment env) throws IOException {pool = new Htablepool (env.getconf
    Iguration (), 10); @Override public void Postgetop (observercontext<regioncoprocessorenvironment> c, get get, Lis
        T<cell> results) throws IOException {htableinterface table = pool.gettable (Bytes.tobytes (source_table));
        String Newrowkey = bytes.tostring (Get.getrow ());

        String pre = newrowkey.substring (0, 1);
            if (Pre.equals ("T")) {string[] splits = Newrowkey.split ("_");
            String Prepre = splits[0].substring (1, 3);
            String timestamp = splits[0].substring (3);
            String uid = splits[1];
            String mid = "";
                for (int i = 2; i < Splits.length i++) {mid + = Splits[i];
            Mid + = "_";
          Mid = mid.substring (0, Mid.length ()-1);  String Rowkey = prepre + uid + "_" + Timestamp + "_" + mid;
            System.out.println (Rowkey);
            Get realget = new Get (Rowkey.getbytes ());

            Result result = Table.get (realget);
            List<cell> cells = Result.listcells ();
            Results.clear ();
            for (cell cell:cells) {results.add (cell);
            @Override public void Postput (Observercontext<regioncoprocessorenvironment> e, Put put, waledit edit, durability durability) throws IOException {try {String Rowkey = Bytes.tos
            Tring (Put.getrow ());

            Htableinterface table = pool.gettable (Bytes.tobytes (source_table));
            String pre = rowkey.substring (0, 2); if (pre.equals ("AA") | | pre.equals ("AB") | | | pre.equals ("AC") | | pre.equals ("BA") | | pre.equals ("BB") | | Pre.equals ("BC") | | Pre.equals ("Ca") | | Pre.equals ("CB") | | Pre.equals ("CC ")) {string[] splits = Rowkey.split (" _ ");
                String uid = splits[0].substring (2);
                String timestamp = splits[1];
                String mid = "";
                    for (int i = 2; i < Splits.length i++) {mid + = Splits[i];
                Mid + = "_";
                Mid = mid.substring (0, Mid.length ()-1);
                String Newrowkey = "T" + Pre + timestamp + "_" + uid + "_" + mid;
                System.out.println (Newrowkey);
                Put indexput2 = new put (newrowkey.getbytes ());
                Indexput2.addcolumn ("Relation". GetBytes (), "Column10". GetBytes (), "GetBytes" ());

            Table.put (INDEXPUT2);

        } table.close (); The catch (Exception ex) {}} @Override public boolean postscannernext (observercontext& Lt Regioncoprocessorenvironment> E, InternalscaNner s, list<result> results, int limit, Boolean hasmore) throws IOException {Htabl
        Einterface table = pool.gettable (Bytes.tobytes (source_table));
        list<result> newresults = new arraylist<result> ();

            for (result result:results) {String Newrowkey = bytes.tostring (Result.getrow ());

            String pre = newrowkey.substring (0, 1);
                if (Pre.equals ("T")) {string[] splits = Newrowkey.split ("_");
                String Prepre = splits[0].substring (1, 3);
                String timestamp = splits[0].substring (3);
                String uid = splits[1];
                String mid = "";
                    for (int i = 2; i < Splits.length i++) {mid + = Splits[i];
                Mid + = "_";
                Mid = mid.substring (0, Mid.length ()-1);

                String Rowkey = prepre + uid + "_" + Timestamp + "_" + mid; GetRealget = new Get (Rowkey.getbytes ());

                Result Newresult = Table.get (realget);
            Newresults.add (Newresult);
        } results.clear ();
        For [result Result:newresults] {results.add (result);

    return hasmore;
    @Override public void Stop (Coprocessorenvironment env) throws IOException {pool.close ();
 }

}

After the MAVEN project is packaged, it is uploaded to the HDFs directory and the jar package is loaded by command.
You can complete the level two index. When the user put the operation, the original Rowkey is converted to the new Rowkey, and then an index is saved. When a user gets operation, the Rowkey is mapped to the actual rowkey, and the actual results are obtained based on the actual rowkey. When the user performs a scanner operation, the result of the scanner is mapped to the result of the actual rowkey and returned to the user.

With the HBase baseregionobserver coprocessor, many hbase operations can be encapsulated.

Baseregionobserver Java Interface (note hbase version)
Https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More