Document directory
- 2. Source of Inspration)
- 3. Detail profiling (Implementation)
- 4. Code Example)
- 5. References)
1. Cause (Why HBase Coprocessor)
HBase, as a column Family database, is most often criticized for the following features: it is difficult to easily create a "secondary index" and to perform operations such as sum, count, and sort. For example, in earlier versions (<0.92) of Hbase, to count the total number of rows in a data table, you need to use the Counter method to execute a MapReduce Job. Although HBase integrates MapReduce in the data storage layer, it can be effectively used for Distributed Computing of data tables. However, in many cases, if you directly place the computing process on the server side during simple addition or aggregation computing, the communication overhead can be reduced to achieve good performance improvement. As a result, HBase introduced coprocessors after 0.92 to implement some exciting new features: the ability to easily create secondary indexes, complex filters (predicate push-down), and access control.
2. Source of Inspration)
The HBase coprocessor was inspired by Jeff Dean's speech (P66-67) in the year 09 ). Based on the speech, it implements coprocessor similar to bigtable, including the following features:
- Code can be run for any sub-table on each table server.
- High-level call interface of the client (the client can directly access the row address of the data table. multi-row read/write will be automatically split into multiple parallel RPC calls)
- Provides a flexible data model that can be used to build distributed services.
- Automatic scaling, Server Load balancer, and application Request Routing
HBase coprocessor is inspired by bigtable, but the implementation details are different. HBase establishes a framework that provides users with class libraries and runtime environments, so that their code can be processed on HBase region server and master. 3. Detail profiling (Implementation)
The coprocessor is divided into two types. The system coprocessor can import all data tables on the region server globally. The table coprocessor allows you to specify a table to use the coprocessor. To better support the flexibility of its behavior, the coprocessor framework provides two plug-ins in different aspects. One is the observer (observer), which is similar to the trigger of a relational database. The other is an endpoint. Dynamic terminals are a bit like stored procedures.
3.1 Observer (Observer)
The observer's design intent is to allow users to reload the upcall method of the coprocessor framework by inserting code, and the callback method triggered by the specific event is executed by the core code of HBase. The coprocessor Framework processes all callback call details. The coprocessor only needs to insert the added or changed functions.
Taking HBase0.92 as an example, it provides three observer interfaces:
- RegionObserver: Provides the client data operation event hooks such as Get, Put, Delete, and Scan.
- WALObserver: Provides WAL related operation hooks.
- MasterObserver: Provides DDL-type operation hooks. Such as creating, deleting, and modifying data tables.
These interfaces can be used in the same place at the same time and executed in different priorities. You can implement complex HBase functional layers based on the coprocessor. HBase has many types of events that can trigger observer methods. These events and methods are integrated into HBase APIs since HBase0.92. However, these APIs may be changed for various reasons, and the interfaces of different versions are greatly changed. For details, refer to Java Doc.
The working principle of RegionObserver is shown in 1. For more information about Observer, see section 9.6.3 of HBaseBook.
Figure 1 Working Principle of RegionObserver
3.2 terminal (Endpoint)
The terminal is an interface of the dynamic RPC plug-in. Its implementation code is installed on the server side, which can be awakened through HBase RPC. The client Class Library provides very convenient methods to call these dynamic interfaces. They can call a terminal at any time, and their implementation code will be remotely executed by the target region and the results will be returned to the terminal. You can use these powerful plug-in interfaces to add new features to HBase. The use of the terminal is shown in the following process:
- Define a new protocol interface and must inherit CoprocessorProtocol.
- Terminal interface, which is imported into the region environment for execution.
- Inherits the abstract class BaseEndpointCoprocessor.
- On the Client side, the terminal can be called by two new HBase Client APIs. Single region: HTableInterface. coprocessorProxy (Class <T> protocol, byte [] row ). Rigons region: HTableInterface. coprocessorExec (Class <T> protocol, byte [] startKey, byte [] endKey, Batch. Call <T, R> callable)
An example of the overall terminal call process is shown in Figure 2:
Figure 2 terminal call process example
4. Code Example)
In this example, we calculate an instance of the number of rows in the HBase table to truly feel the convenience and strength of the coprocessor. In earlier versions of HBase, We need to write MapReduce code to summarize the number of rows in the data table. In HBase Versions later than 0.92, we only need to write client code to implement it. It is very suitable for WebService encapsulation.
4.1 Enable Aggregation (Enable Coprocessor Aggregation)
We have two methods: 1. Start global aggregation to manipulate data on all tables. By modifying the hbase-site.xml file, you just need to add the following code:
<property> <name>hbase.coprocessor.user.region.classes</name> <value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value> </property>
2. enable aggregation for a specific table only. Implemented through HBase Shell.
(1) disable specifies the table. Hbase> disable'Mytable'
(2) Add aggregation hbase> alter'Mytable', METHOD => 'table _ att', 'coprocessor '=>' | org. apache. hadoop. hbase. coprocessor. aggresponimplementation |'
(3) restart the specified table hbase> enable'Mytable'
4.2 Code for counting the number of lines (Code Snippet)
Public class MyAggregationClient {private static final byte [] TABLE_NAME = Bytes. toBytes ("mytable"); private static final byte [] CF = Bytes. toBytes ("vent"); public static void main (String [] args) throws Throwable {Configuration customConf = new Configuration (); customConf. setStrings ("hbase. zookeeper. quorum "," node0, node1, node2 "); // increase the RPC communication duration customConf. setLong ("hbase. rpc. timeout ", 600000); // sets the Scan cache customConf. setLong ("hbase. client. invalid. caching ", 1000); Configuration configuration Configuration = HBaseConfiguration. create (customConf); AggregationClient aggregationClient = new AggregationClient (configuration); Scan scan = new Scan (); // specify the scan column family with the unique Scan value. addFamily (CF); long rowCount = aggregationClient. rowCount (TABLE_NAME, null, scan); System. out. println ("row count is" + rowCount );}}
5. References)
[1] Lai, et al., (), "Coprocessor Introduction: Apache HBase". Available: https://blogs.apache.org/hbase/entry/coprocessor_introduction
[2] Apache. (), "The Apache HBase Reference Guide". Available: http://hbase.apache.org/book.html#coprocessors