Main content:
1. HBase coprocessor Introduction
2. Observers (Observer)
3. Terminal (Endpoint)
--------------------------------------------------------------------------------------------------------------- ----------------------------------------------------
1. HBase coprocessor Introduction
The system coprocessor can globally import all the data tables on the region server, and the table coprocessor is where the user can specify a table to use the coprocessor. There are two types of hbase coprocessor (coprocessor): Observer coprocessors and Endpoint coprocessor.
The former is similar to a trigger, which triggers when a particular event occurs, which resembles a stored procedure and performs data calculations. The observer coprocessor may be used in many places, such as: Data security permissions restrictions, data foreign key reference or conformance, Level two index, the main types are: Regionobserver,regionserverobserver,masterobserver, Walobserver.
2. Observers (Observer)
The design intent of the observer is to allow the user to overload the Upcall method of the coprocessor framework by inserting code, and the specific event-triggering callback method is executed by the core code of HBase. The coprocessor framework handles all the callback invocation details, and the coprocessor itself only needs to insert the added or changed functionality.
As an example of the HBase0.92 version, it provides three observer interfaces:
- Regionobserver: Provides client data manipulation event hooks: Get, Put, Delete, scan, and so on.
- Walobserver: Provides Wal-related operation Hooks.
- Masterobserver: Provides a DDL-type operation Hook. such as creating, deleting, modifying data tables, and so on.
These interfaces can be used in the same place at the same time, in different priority order. Users can implement the complex HBase function layer arbitrarily based on the coprocessor. HBase has a number of events that can trigger the observer method, which is integrated in the HBase API from the HBase0.92 version. However, these APIs may be changed for a variety of reasons, different versions of the interface changes relatively large, the specific reference Java Doc,regionobserver works as shown.
Figure 1 How Regionobserver works
3. Terminal (Endpoint)
HBase provides the client Java package org.apache.hadoop.hbase.client.coprocessor. It provides the following three ways to invoke the services provided by the coprocessor:
- Table.coprocessorservice (byte[])
- Table.coprocessorservice (Class, byte[], Byte[],batch.call),
- Table.coprocessorservice (Class, byte[], byte[], Batch.call, Batch.callback)
The Endpoint coprocessor runs in the region context, and an HBase table may have more than one region. The client can therefore specify to invoke a coprocessor on a single region, process it on a single region and return a certain result, or call a coprocessor on a range of areas to execute concurrently and summarize the results. For different needs, you can choose from the following three ways. (1) calling coprocessor RPC on a single regionThe first method uses API Coprocessorservice (byte[]), which only calls the coprocessor on a single region. This method uses Rowkey to specify region. This is because HBase clients rarely directly manipulate region, generally do not need to know the name of region, and in HBase, the region name will change at any time, so using Rowkey to specify region is the most reasonable way. The Rowkey can be used to specify a unique region, and if a given rowkey does not exist, it can still be used to specify it as long as it is within the Rowkey range of a certain area. For example, Region1 processing the data in this interval [Row1, row100], the ROWKEY=ROW1 is handled by Region1, in other words, we can use Row1 to specify Region1, regardless of whether the record Rowkey equals "Row1" exists.
Figure 2 Calling the coprocessor on a single region
The Coprocessorservice method returns an object of type Coprocessorrpcchannel, which is connected to the region specified by the Rowkey, through which the coprocessor RPC deployed on this channel can be called. We have defined the RPC Service through Protobuf. Calling the Service's Newblockingstub () method, Coprocessorrpcchannel as an input parameter, you can get the stub object of the RPC call, and then call the remote RPC.
Code 1 getting the rowcount of a single region
1 LongSingleregioncount (String tableName, String Rowkey,boolean recount)2 {3 LongRowCount =0;4 Try{5Configuration config =NewConfiguration ();6Hconnection conn =hconnectionmanager.createconnection (config);7Htableinterface TBL =conn.gettable (tableName);8 //Get Channel9Coprocessorrpcchannel channel =Tbl.coprocessorservice (Rowkey.getbytes ());TenOrg.ibm.developerworks.getRowCount.ibmDeveloperWorksService.BlockingInterface Service = One org.ibm.developerworks.getRowCount.ibmDeveloperWorksService.newBlockingStub (channel); A //Set RPC entry parameters -Org.ibm.developerworks.getRowCount.getRowCountRequest.Builder request = - Org.ibm.developerworks.getRowCount.getRowCountRequest.newBuilder (); the Request.setrecount (recount); - //Call RPC -org.ibm.developerworks.getRowCount.getRowCountResponse ret = -Service.getrowcount (NULL, Request.build ()); + - //parsing Results +RowCount =Ret.getrowcount (); A } at Catch(Exception e) {e.printstacktrace ();} - returnrowcount; -}
(2) Call the coprocessor RPC on multiple region without using callback
Sometimes the client needs to invoke the same coprocessor on multiple region, for example, to count the rowcount of the entire table, in which case all the region needs to be involved, and the rowcount within the region are counted and returned to the client. The final client summarizes all the results of the region's return, so that the rowcount of the entire table can be obtained. This means that the client has batch interaction with more than one region at a time. The method is to collect Startkey for each region and then loop through the first Coprocessorservice method: Use the Startkey of each region as the entry parameter, get the RPC channel, create the stub object, The coprocessor RPC on each region is then called one by one. This approach requires writing a lot of code, and HBase provides two simpler coprocessorservice methods to handle multiple region coprocessor calls. Take a look at the first method Coprocessorservice (Class, Byte[],byte[],batch.call), which has 4 entry parameters. The first parameter is the service class that implements RPC, which is the Ibmdeveloperworksservice class in the previous article. With it, HBase can find the corresponding coprocessor deployed on the region, where multiple coprocessors can be deployed, and the client must specify the service class to differentiate which coprocessor is required to invoke the services provided. the services on which region to invoke are determined by Startkey and EndKey, which can be determined by the Rowkey range. For this reason, the second and third parameters of the Coprocessorservice method are Startkey and EndKey, respectively, and any region falling within the [Startkey,endkey] interval will participate in this invocation. The fourth parameter is an interface class Batch.call. It defines how the coprocessor is invoked, and the user implements the client's logic by overloading the call () method of the interface. Within the call () method, RPC can be called, and the return value is handled arbitrarily. That's what you did in the previous code 1. Coprocessorservice will be responsible for calling this call method on each region. The return value of the Coprocessorservice method is a collection of map types. The key of the collection is the region name, and value is the return value of the Batch.Call.call method. The collection can be thought of as the result set returned by the coprocessor RPC for all region. Client code can traverse the collection to summarize all the results. The overall workflow of this coprocessorservice method is as follows. First it analyzes Startkey and EndKey and finds all regions within that interval, assuming they are stored in regionlist. Then, traversing the regionlist, call Batch.call for each region, within which the user defines the specific RPC invocation logic. Finally Coprocessorservice adds the return value of all Batch.Call.call () to the result set and returns. As shown in the following:
Figure 3 Calling a coprocessor on multiple region--without using callback
(3) Call the coprocessor RPC on multiple region, using callback
Coprocessorservice's third method has a parameter callback more than the second one. Coprocessorservice the second method internally uses the default callback of HBase, which adds the returned result of each region to the result set of a map type, which is the default callback. And the collection as the return value of the Coprocessorservice method. The key of the result set is the region name, and value is the return value of the call method. With this approach, the client code needs to save the RPC execution results in a single collection, then into a loop and iterate through the result set for further processing. In some cases this overhead of using collections is unnecessary. This overhead can be eliminated by directly processing the return results for each region. The exact process is as follows:
Figure 4 Calling a coprocessor on multiple region--using callback
HBase provides a third Coprocessorservice method that allows the user to define callback behavior, Coprocessorservice calls the callback for each RPC return result, and the user can callback Execute the required logic, such as the sum summation. In the case of the second method, the return result of each region coprocessor RPC is first put into a list, and after all the region is returned, the user code then extracts each result from the list, and in the third way, accumulates it directly in the callback, The cost of creating a result set and traversing the collection is more efficient. So we just need to define an extra callback, callback is a Batch.callback interface class, and the user needs to reload its Update method.
HBase coprocessor Combat