[Reprinted] Analysis of hbase coprocessor

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hbase coprocessor is one of many people's major expectations for hbase-0.92. It combines offline analysis with online applications. It also greatly expands the application richness of hbase and is no longer a simple K-V Application. The Design of hbase coprocessor comes from two issue: hbase-2000 and hbase-2001. After a few years, to what extent does hbase coprocessor develop and Where can it be used? The following content mainly comes from the members of the Trend Micro hadoop group and is also the author of hbase coprocessor. Those who are unwilling to read the crap can jump to the final summary. Introduction to hbase
Coprocessor's official blog address https://blogs.apache.org/hbase/entry/coprocessor_introduction

Hbase coprocessor is fully release with the release of 0.92. Its design comes from Google bigtable's coprocessor (Jeff Dean's speech ). Coprocessor is actually an analysis component similar to mapreduce, But it greatly simplifies the mapreduce model, but only runs requests independently in each region in parallel, it also provides a framework that allows users to flexibly write custom coprocessor. The biggest difference between hbase coprocessor and Google coprocessor is that hbase coprocessor is a framework in the regionserver and master processes that can dynamically execute user code at runtime, google's coprocessor has its own independent process and address space. It seems that hbase's coprocessor is more efficient, but this is not the case. The advantage of having an independent address space is that it can use hardware more efficiently by binding with computing resources, for example, you can bind a cgroup to an independent CPU to work.

Currently, hbase coprocessor has two completely different implementations: The Observer mode and the endpoint mode, which correspond to two issue values: 2000 and 2001. We can regard the observer mode as a trigger in the database, while the endpoint can be considered as a stored procedure.

For coprocessor, we can see from the class inheritance relationship, as shown in:

There are three observer objects, namely, masterobserver, regionobserver, and walobserver. They work in a similar way as hook functions. They add pre () before the implementation of the real function, and then add the post () method to implement some embedded changes to the operation. The effect of efficiency depends only on the embedded hook function itself. As shown in:

For example, the following code implements permission check before get:

publicclassAccessControlCoprocessorextends BaseRegionObserver {@Overridepublic void preGet(final ObserverContext c,final Get get, final List result) throws IOExceptionthrows IOException {// check permissions..if (!permissionGranted()) {thrownewAccessDeniedException("User is not allowed to access.");}}// override prePut(), preDelete(), etc.}

In the hbase community, many people have proposed various observer-based implementation methods to do something meaningful. For example, you can use regionobserver to implement the permissions you just mentioned, as well as fault isolation and priority. Using masterobserver, You can monitor DDL operations. Through walobserver, secondary indexes and replication can be implemented. However, unfortunately, there are few ideas to achieve. The main reason is that everyone is busy, and the corresponding requirements are not strong at this stage, so we will leave them for the future. With the flexibility and scalability of the observer framework, you can easily Customize various implementations that meet your needs.

The baseendpoint class directly implements the coprocessor interface, which is actually the implementation of client-side, that is, the client encapsulates an htable implementation and encapsulates operations similar to getmin, internally, it is implemented through parallel scan and other operations. This is similar to a simple mapreduce. map is the region on the target region server, while reduce is the client. This method is very intuitive for those who are used to writing map reduce code and can easily implement interfaces such as sum and Min. This layer of packaging can eliminate many development details, so that developers can focus on developing applications and deliver the underlying implementation to mapreduce.
Programmer.

An example of a simple endpoint is as follows:

public T getMin(ColumnInterpreter ci, Scan scan)throws IOException {T min = null;T temp;InternalScanner scanner = ((RegionCoprocessorEnvironment) getEnvironment()).getRegion().getScanner(scan);List results = new ArrayList();byte[] colFamily = scan.getFamilies()[0];byte[] qualifier = scan.getFamilyMap().get(colFamily).pollFirst();try {boolean hasMoreRows = false;do {hasMoreRows = scanner.next(results);for (KeyValue kv : results) {temp = ci.getValue(colFamily, qualifier, kv);min = (min == null || ci.compare(temp, min) < 0) ? temp : min;}results.clear();} while (hasMoreRows);} finally {scanner.close();}log.info("Minimum from this region is "+ ((RegionCoprocessorEnvironment) getEnvironment()).getRegion().getRegionNameAsString() + ": " + min);return min;}

Coprocessor allows users to load their own code at any time without restarting the system. It can be loaded through configuration or through shell or Java programs.

Summary:

Hbase coprocessor originated from Google bigtable coprocessor. The difference is that hbase coprocessor is not an independent process, but is embedded in the framework of the original process.
The implementation of hbase coprocessor is divided into observer and endpoint. The observer is similar to a trigger and mainly works on the server, while the endpoint is similar to a stored procedure and mainly works on the client.
The observer can implement permission management, priority setting, monitoring, DDL control, secondary index, and other functions, while the endpoint can implement functions such as Min, MAS, AVG, and sum.
Coprocessor can be dynamically loaded
The performance of coprocessor depends on the performance of the method to be loaded. However, for an intuitive understanding, we plan to test the performance of some typical hbase coprocessor scenarios in the near future.

Source: http://walkoven.com /? P = 77

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Reprinted] Analysis of hbase coprocessor

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Reprinted] Analysis of hbase coprocessor

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support