HBase Two-level index __hbase

Source: Internet
Author: User
Tags hadoop mapreduce
Overview

HBase is a Key-value database based on Hadoop, which provides efficient random read and write services on HDFS data, perfectly fills the pitfalls of Hadoop MapReduce only for batch processing and is being used by more and more users. As an important feature of HBase, Coprocessor was added to the HBase 0.92 version and was widely popular

With the coprocessor, the user can write code that runs on the HBase server side. HBase supports two types of coprocessor, Endpoint and Observer. The Endpoint coprocessor is similar to a stored procedure in a traditional database where clients can invoke these Endpoint coprocessor to execute a section of server-side code and return the results of the server-side code to the client for further processing, the most common use being aggregation operations. Without a coprocessor, when the user needs to find the maximum data in a table, the Max aggregation operation, a full table scan is required, traversing the scan results within the client code, and performing the maximum operation. Such a method cannot take advantage of the concurrency capability of the underlying cluster, and it is inefficient to centralize all computations to the client side for unified execution. With coprocessor, the user can deploy the maximum code to the HBase server, HBase will take advantage of multiple nodes at the bottom cluster to perform the maximum operation concurrently. That is, the maximum code is executed within each Region range, and the maximum value of each Region is computed on the Region Server side, and only the max value is returned to the client. The maximum value is found in further processing of the maximum number of Region by the client. This will improve the overall efficiency of the implementation of a lot.

Another coprocessor is called Observer coprocessor, a coprocessor similar to a trigger in a traditional database, which is invoked by the Server side when certain events occur. Observer coprocessor is a hook hook that is scattered around the HBase Server-side code and is invoked when a fixed event occurs. For example, there is a hook function preput before the put operation, which is called by Region Server before the put operation is executed, and a postput hook function after the put operation. development Environment maven-3.3.9 JDK 1.7 cdh-hbase-1.2.0 myeclipse hbase coprocessor loading

Enter HBase command line

# HBase Shell

HBase (Main):> disable ' test '    

hbase (main):> alter ' test ', CONFIGURATION => {' Hbase.table.sanity.checks ' = > ' false '}         //-----Create a table, execute once on the line

hbase (main):> alter ' test ', ' coprocessor ' => ' hdfs:///code/jars/ regionobserver-put5.jar|com.hbase.observer.app|1001 '   //----load jar package

hbase (main):> alter ' test ', method = > ' Table_att_unset ', NAME => ' coprocessor$1 '  //--------Uninstall Jar Pack

hbase (main):> desc ' test '    //-- -----"View the table's property Description

hbase (main):> enable ' test '
Complete Engineering Code
Package com.hbase.observer;
/** * HBase Level Two index * @author wing * @createTime 2017-4-7/import java.io.IOException;
Import java.util.ArrayList;

Import java.util.List;
Import Org.apache.hadoop.hbase.Cell;
Import org.apache.hadoop.hbase.CoprocessorEnvironment;
Import org.apache.hadoop.hbase.client.Durability;
Import Org.apache.hadoop.hbase.client.Get;
Import Org.apache.hadoop.hbase.client.HTableInterface;
Import Org.apache.hadoop.hbase.client.HTablePool;
Import Org.apache.hadoop.hbase.client.Put;
Import Org.apache.hadoop.hbase.client.Result;
Import Org.apache.hadoop.hbase.coprocessor.BaseRegionObserver;
Import Org.apache.hadoop.hbase.coprocessor.ObserverContext;
Import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
Import Org.apache.hadoop.hbase.regionserver.InternalScanner;
Import Org.apache.hadoop.hbase.regionserver.wal.WALEdit;

Import org.apache.hadoop.hbase.util.Bytes;

    public class App extends Baseregionobserver {private Htablepool pool = null; PriVate final static String source_table = "Test"; @Override public void Start (Coprocessorenvironment env) throws IOException {pool = new Htablepool (env.getconf
    Iguration (), 10); @Override public void Postgetop (observercontext<regioncoprocessorenvironment> c, get get, Lis
        T<cell> results) throws IOException {htableinterface table = pool.gettable (Bytes.tobytes (source_table));
        String Newrowkey = bytes.tostring (Get.getrow ());

        String pre = newrowkey.substring (0, 1);
            if (Pre.equals ("T")) {string[] splits = Newrowkey.split ("_");
            String Prepre = splits[0].substring (1, 3);
            String timestamp = splits[0].substring (3);
            String uid = splits[1];
            String mid = "";
                for (int i = 2; i < Splits.length i++) {mid + = Splits[i];
            Mid + = "_";
          Mid = mid.substring (0, Mid.length ()-1);  String Rowkey = prepre + uid + "_" + Timestamp + "_" + mid;
            System.out.println (Rowkey);
            Get realget = new Get (Rowkey.getbytes ());

            Result result = Table.get (realget);
            List<cell> cells = Result.listcells ();
            Results.clear ();
            for (cell cell:cells) {results.add (cell);
            @Override public void Postput (Observercontext<regioncoprocessorenvironment> e, Put put, waledit edit, durability durability) throws IOException {try {String Rowkey = Bytes.tos
            Tring (Put.getrow ());

            Htableinterface table = pool.gettable (Bytes.tobytes (source_table));
            String pre = rowkey.substring (0, 2); if (pre.equals ("AA") | | pre.equals ("AB") | | | pre.equals ("AC") | | pre.equals ("BA") | | pre.equals ("BB") | | Pre.equals ("BC") | | Pre.equals ("Ca") | | Pre.equals ("CB") | | Pre.equals ("CC ")) {string[] splits = Rowkey.split (" _ ");
                String uid = splits[0].substring (2);
                String timestamp = splits[1];
                String mid = "";
                    for (int i = 2; i < Splits.length i++) {mid + = Splits[i];
                Mid + = "_";
                Mid = mid.substring (0, Mid.length ()-1);
                String Newrowkey = "T" + Pre + timestamp + "_" + uid + "_" + mid;
                System.out.println (Newrowkey);
                Put indexput2 = new put (newrowkey.getbytes ());
                Indexput2.addcolumn ("Relation". GetBytes (), "Column10". GetBytes (), "GetBytes" ());

            Table.put (INDEXPUT2);

        } table.close (); The catch (Exception ex) {}} @Override public boolean postscannernext (observercontext& Lt Regioncoprocessorenvironment> E, InternalscaNner s, list<result> results, int limit, Boolean hasmore) throws IOException {Htabl
        Einterface table = pool.gettable (Bytes.tobytes (source_table));
        list<result> newresults = new arraylist<result> ();

            for (result result:results) {String Newrowkey = bytes.tostring (Result.getrow ());

            String pre = newrowkey.substring (0, 1);
                if (Pre.equals ("T")) {string[] splits = Newrowkey.split ("_");
                String Prepre = splits[0].substring (1, 3);
                String timestamp = splits[0].substring (3);
                String uid = splits[1];
                String mid = "";
                    for (int i = 2; i < Splits.length i++) {mid + = Splits[i];
                Mid + = "_";
                Mid = mid.substring (0, Mid.length ()-1);

                String Rowkey = prepre + uid + "_" + Timestamp + "_" + mid; GetRealget = new Get (Rowkey.getbytes ());

                Result Newresult = Table.get (realget);
            Newresults.add (Newresult);
        } results.clear ();
        For [result Result:newresults] {results.add (result);

    return hasmore;
    @Override public void Stop (Coprocessorenvironment env) throws IOException {pool.close ();
 }

}

After the MAVEN project is packaged, it is uploaded to the HDFs directory and the jar package is loaded by command.
You can complete the level two index. When the user put the operation, the original Rowkey is converted to the new Rowkey, and then an index is saved. When a user gets operation, the Rowkey is mapped to the actual rowkey, and the actual results are obtained based on the actual rowkey. When the user performs a scanner operation, the result of the scanner is mapped to the result of the actual rowkey and returned to the user.

With the HBase baseregionobserver coprocessor, many hbase operations can be encapsulated.

Baseregionobserver Java Interface (note hbase version)
Https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.