Analysis of HBase coprocessor

Last Update:2014-12-22 Source: Internet

Author: User

Keywords Realize can pass very independently

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The

http://www.aliyun.com/zixun/aggregation/13713.html ">hbase coprocessor is one of the great expectations that many people have for hbase-0.92. It enables off-line analysis and online applications to be well integrated, and also greatly expands the application richness of hbase and is no longer a simple K class application. The design of HBase coprocessor from the hbase-2000 and hbase-2001 two issue. After a few years, hbase coprocessor to what extent can they be used to where? The following main content from trend Micro Hadoop group members, but also hbase coprocessor author, not willing to look at the bottom of the long-winded students can jump directly to the end of the summary look.
Https://blogs.apache.org/hbase/entry/coprocessor_introduction
HBase Coprocessor is released completely with the release of 0.92, and its design comes from the coprocessor of Google BigTable (a speech by Jeff Dean). Coprocessor is actually a similar mapreduce analysis component, but it greatly simplifies the MapReduce model and simply runs the request independently in each region. and provides a framework that allows users to write customized coprocessor very flexibly. The biggest difference between HBase's coprocessor and Google's coprocessor is that HBase coprocessor is a set of frameworks within the regionserver and master processes that can dynamically execute user code at runtime. And Google's coprocessor is to have its own independent process and address space. It may seem that HBase's coprocessor is more efficient, but it is not, and the advantage of having a separate address space is that it can be used more efficiently by binding to compute resources, such as by cgroup binding to separate CPUs.
Now there are two completely different implementations of the coprocessor of HBase, the Observer mode and the endpoint pattern, which correspond to 2000 and 20,012 issue respectively. We can view the observer pattern as a trigger in a database, and endpoint can be considered as a stored procedure.　
　About Coprocessor we can see it from the class inheritance relationship, as shown in the following illustration:

There are three observer objects, namely Masterobserver,regionobserver and Walobserver. They work like hook functions, add the pre () before the actual function is implemented, and add post () method to implement some embedded changes to the operation. The effect of efficiency depends solely on the effect of the embedded hook function itself. As shown in the following illustration:

For example, the following code implements permission checks before get:

publicclassaccesscontrolcoprocessorextends Baseregionobserver {

@Override

public void Preget (final observercontext C,

final GET, final List result throws IOException

throws IOException {

//Check permissions ...

if (!permissiongranted ()) {

thrownewaccessdeniedexception ("User isn't even to access.");

　　}

　　}

//Override Preput (), Predelete (), etc.

In the HBase community, many people have proposed a variety of observer based implementation methods to do something meaningful. For example, through the regionobserver to achieve the rights mentioned just now, as well as fault isolation, priority and so on. The masterobserver enables monitoring of DDL operations. Through Walobserver, the two-level index, replication and so on can be realized. However, unfortunately, very few ideas have been achieved, the main reason is that everyone is relatively busy, and the corresponding demand at this stage is not strong, so left behind to do. Because of the flexibility and scalability of the observer framework, users can easily customize a variety of implementations that meet their needs.

Baseendpoint class is a direct implementation of the Coprocessor interface, which is actually a client-side implementation, that is, the client packaged a htable implementation, will be similar to the getmin () of the operation of packaging, The interior is realized by parallel scan operation. This is similar to a simple Mapreduce,map is the region on the target region server, and reduce is the client. This approach is very intuitive for people who are accustomed to writing map reduce code, and can easily implement interfaces such as Sum,min. As a result of this layer of packaging, you can shield off a lot of development details, so that the development of applications to focus on the development of applications, the bottom of the implementation to MapReduce programmer.

A simple endpoint example is as follows:

public T getmin (Columninterpreter ci, Scan Scan)

throws IOException {

T min = null;

T temp;

Internalscanner scanner = ((regioncoprocessorenvironment) getenvironment ())

. Getregion (). Getscanner (scan);

List results = new ArrayList ();

BYTE] colfamily = scan.getfamilies () [0];

byte] qualifier = Scan.getfamilymap (). Get (colfamily). Pollfirst ();

try {

Boolean hasmorerows = false;

do {

hasmorerows = Scanner.next (results);

for (KeyValue kv:results) {

temp = Ci.getvalue (colfamily, qualifier, KV);

min = (min = null | | ci.compare (temp, Min) < 0)? Temp:min;

　　}

results.clear ();

} while (hasmorerows);

} finally {

Scanner.close ();

　　}

Log.info ("Minimum from this region is"

+ ((regioncoprocessorenvironment) getenvironment ()). Getregion ()

. Getregionnameasstring () + ":" + min);

return min;

}coprocessor allows users to load their implemented code at any time without restarting the system. It can be loaded by configuration or by a shell or Java program.

Summary:

HBase coprocessor is derived from the Google bigtable coprocessor, the difference is hbase coprocessor is not an independent process, but in the original process embedded framework

The implementation of
HBase coprocessor is divided into observer and endpoint, where observer is similar to triggers, mainly working on the server side, while endpoint is similar to a stored procedure and works primarily on the client side

Observer can realize rights management, priority setting, monitoring, DDL control, two-level indexing and other functions, and endpoint can achieve min, mas, AVG, SUM and other functions

coprocessor can be loaded dynamically

The performance of
coprocessor depends on the performance of the method you want to load, but in order to have an intuitive understanding, we are still looking to perform performance testing on some of the typical scenarios of HBase coprocessor in the near future.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More