[Reprinted] Analysis of hbase coprocessor

Source: Internet
Author: User

Hbase coprocessor is one of many people's major expectations for hbase-0.92. It combines offline analysis with online applications. It also greatly expands the application richness of hbase and is no longer a simple K-V Application. The Design of hbase coprocessor comes from two issue: hbase-2000 and hbase-2001. After a few years, to what extent does hbase coprocessor develop and Where can it be used? The following content mainly comes from the members of the Trend Micro hadoop group and is also the author of hbase coprocessor. Those who are unwilling to read the crap can jump to the final summary. Introduction to hbase
Coprocessor's official blog address https://blogs.apache.org/hbase/entry/coprocessor_introduction

Hbase coprocessor is fully release with the release of 0.92. Its design comes from Google bigtable's coprocessor (Jeff Dean's speech ). Coprocessor is actually an analysis component similar to mapreduce, But it greatly simplifies the mapreduce model, but only runs requests independently in each region in parallel, it also provides a framework that allows users to flexibly write custom coprocessor. The biggest difference between hbase coprocessor and Google coprocessor is that hbase coprocessor is a framework in the regionserver and master processes that can dynamically execute user code at runtime, google's coprocessor has its own independent process and address space. It seems that hbase's coprocessor is more efficient, but this is not the case. The advantage of having an independent address space is that it can use hardware more efficiently by binding with computing resources, for example, you can bind a cgroup to an independent CPU to work.

Currently, hbase coprocessor has two completely different implementations: The Observer mode and the endpoint mode, which correspond to two issue values: 2000 and 2001. We can regard the observer mode as a trigger in the database, while the endpoint can be considered as a stored procedure.

For coprocessor, we can see from the class inheritance relationship, as shown in:


There are three observer objects, namely, masterobserver, regionobserver, and walobserver. They work in a similar way as hook functions. They add pre () before the implementation of the real function, and then add the post () method to implement some embedded changes to the operation. The effect of efficiency depends only on the embedded hook function itself. As shown in:


For example, the following code implements permission check before get:

publicclassAccessControlCoprocessorextends BaseRegionObserver {@Overridepublic void preGet(final ObserverContext c,final Get get, final List result) throws IOExceptionthrows IOException {// check permissions..if (!permissionGranted()) {thrownewAccessDeniedException("User is not allowed to access.");}}// override prePut(), preDelete(), etc.}

In the hbase community, many people have proposed various observer-based implementation methods to do something meaningful. For example, you can use regionobserver to implement the permissions you just mentioned, as well as fault isolation and priority. Using masterobserver, You can monitor DDL operations. Through walobserver, secondary indexes and replication can be implemented. However, unfortunately, there are few ideas to achieve. The main reason is that everyone is busy, and the corresponding requirements are not strong at this stage, so we will leave them for the future. With the flexibility and scalability of the observer framework, you can easily Customize various implementations that meet your needs.

The baseendpoint class directly implements the coprocessor interface, which is actually the implementation of client-side, that is, the client encapsulates an htable implementation and encapsulates operations similar to getmin, internally, it is implemented through parallel scan and other operations. This is similar to a simple mapreduce. map is the region on the target region server, while reduce is the client. This method is very intuitive for those who are used to writing map reduce code and can easily implement interfaces such as sum and Min. This layer of packaging can eliminate many development details, so that developers can focus on developing applications and deliver the underlying implementation to mapreduce.
Programmer.

An example of a simple endpoint is as follows:

public T getMin(ColumnInterpreter ci, Scan scan)throws IOException {T min = null;T temp;InternalScanner scanner = ((RegionCoprocessorEnvironment) getEnvironment()).getRegion().getScanner(scan);List results = new ArrayList();byte[] colFamily = scan.getFamilies()[0];byte[] qualifier = scan.getFamilyMap().get(colFamily).pollFirst();try {boolean hasMoreRows = false;do {hasMoreRows = scanner.next(results);for (KeyValue kv : results) {temp = ci.getValue(colFamily, qualifier, kv);min = (min == null || ci.compare(temp, min) < 0) ? temp : min;}results.clear();} while (hasMoreRows);} finally {scanner.close();}log.info("Minimum from this region is "+ ((RegionCoprocessorEnvironment) getEnvironment()).getRegion().getRegionNameAsString() + ": " + min);return min;}

Coprocessor allows users to load their own code at any time without restarting the system. It can be loaded through configuration or through shell or Java programs.

Summary:

  • Hbase coprocessor originated from Google bigtable coprocessor. The difference is that hbase coprocessor is not an independent process, but is embedded in the framework of the original process.
  • The implementation of hbase coprocessor is divided into observer and endpoint. The observer is similar to a trigger and mainly works on the server, while the endpoint is similar to a stored procedure and mainly works on the client.
  • The observer can implement permission management, priority setting, monitoring, DDL control, secondary index, and other functions, while the endpoint can implement functions such as Min, MAS, AVG, and sum.
  • Coprocessor can be dynamically loaded
  • The performance of coprocessor depends on the performance of the method to be loaded. However, for an intuitive understanding, we plan to test the performance of some typical hbase coprocessor scenarios in the near future.

Source: http://walkoven.com /? P = 77

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.