1. Cause (why HBase coprocessor)
The most frequently criticized features of HBase as a column family database include the inability to easily create a "two-level index", which makes it difficult to perform sums, counts, and sorts. For example, in an older version of HBase (<0.92), the total number of rows in the statistics table needs to be done using the counter method to execute a mapreduce job. Although HBase integrates mapreduce in the data storage layer, it can be used effectively for distributed computing of data tables. In many cases, however, when you do some simple addition or aggregation calculations, if you place the calculation directly on the server side, you can reduce communication overhead to achieve a good performance boost. HBase then introduced a coprocessor (coprocessors) after 0.92 to achieve some exciting new features: the ability to easily build two indexes, complex filters (predicate pushes), and access control. 2. Source of inspiration (source of inspration)
The HBase coprocessor is inspired by Jeff Dean's 09-year speech (p66-67). It implements a coprocessor similar to bigtable based on this presentation, which includes the following features: Any child table of a table server can run a high-level invocation interface of the code client (the client has direct access to the row address of the data table, and multi-line reads and writes are automatically fragmented into multiple parallel RPC calls) providing a very flexible, The data model that can be used to build a distributed service can automate scaling, load balancing, and application request routing HBase's coprocessor is inspired by bigtable, but the implementation details are different. HBase establishes a framework that provides the user with a class library and runtime environment so that their code can be processed on HBase region server and master. 3. Detail Analysis (Implementation)
There are two types of coprocessor, the system coprocessor can globally import all data tables on region server, and the table coprocessor is the user can specify a table to use the coprocessor. The coprocessor framework provides two different aspects of plug-ins to better support the flexibility of its behavior. One is the Watcher (Observer), a trigger similar to a relational database. The other is the terminal (endpoint), the dynamic terminal is a bit like a stored procedure. 3.1 observers (Observer)
The design intent of the observer is to allow the user to overload the Upcall method of the coprocessor framework by inserting code, and the specific event-triggering callback method is executed by the core code of HBase. The coprocessor framework handles all the callback invocation details, and the coprocessor itself only needs to insert the added or changed functionality.
As an example of the HBase0.92 version, it provides three observer interfaces: Regionobserver: Provides client data manipulation event hooks: Get, Put, Delete, scan, and so on. Walobserver: Provides Wal-related operation Hooks. Masterobserver: Provides ddl-type of operation Hooks. such as creating, deleting, modifying data tables, and so on.
These interfaces can be used in the same place at the same time, in different priority order. Users can implement the complex HBase function layer arbitrarily based on the coprocessor. HBase has a number of events that can trigger the observer method, which is integrated in the HBase API from the HBase0.92 version. However, these APIs may be changed for a variety of reasons, different versions of the interface changes relatively large, specific reference Java DOC.
The Regionobserver works as shown in Figure 1. For more details on observer see section 9th. 6.3 of Hbasebook.
Figure 1 How Regionobserver works
3.2 Terminal (Endpoint)
The terminal is the interface of the dynamic RPC plug-in, and its implementation code is installed on the server side, thus being able to wake up via HBase RPC. The Client class library provides a very convenient way to invoke these dynamic interfaces, which can invoke a terminal at any time, and their implementation code will be executed remotely by the target region, and the result will be returned to the terminal. Users can use these powerful plug-in interfaces together to add new features to HBase. The use of the terminal, as shown in the following process: Defines a new protocol interface, which must inherit the Coprocessorprotocol. Implements the terminal interface, which is implemented by importing the region environment. Inherits the abstract class Baseendpointcoprocessor. On the client side, the terminal can be called by two new HBase client APIs. Single Region:HTableInterface.coprocessorProxy (class<t> protocol, byte[] row). Rigons area: htableinterface.coprocessorexec (class<t> protocol, byte[] Startkey, byte[] EndKey, Batch.Call<T,R > Callable)
An example of the overall terminal invocation process, as shown in Figure 2:
Figure 2 Example of a terminal invocation procedure 4. Programming Practice (Code Example)
In this example, we can really feel the convenience and power of the coprocessor by calculating an instance of the number of rows in the HBase table. In the old version of HBase we need to write the MapReduce code to summarize the number of rows in the data table, in more than 0.92 versions of HBase, just write the client's code to implement, very suitable for use in the WebService package. 4.1 enable coprocessor Aggregation (Enable coprocessor Aggregation)
We have two methods: 1. Start the global aggregation and be able to manipulate the data on all the tables. By modifying the Hbase-site.xml file, you only need to add the following code:
<property>
<name>hbase.coprocessor.user.region.classes</name>
<value> Org.apache.hadoop.hbase.coprocessor.aggregateimplementation</value>
</property>
2. Enable table aggregation, which only takes effect for a specific table. Implemented through the HBase Shell.
(1) Disable the specified table. hbase> Disable ' mytable '
(2) Add aggregation hbase> alter ' mytable ', METHOD = ' Table_att ', ' coprocessor ' and ' = ' | org.apache.hadoop.hbase.coprocessor.aggregateimplementation| | '
(3) Restart the specified table hbase> enable ' mytable ' 4.2 statistics line number code Snippet
public class Myaggregationclient {
private static final byte[] table_name = bytes.tobytes ("MyTable");
private static final byte[] CF = bytes.tobytes ("vent");
public static void Main (string[] args) throws Throwable {
configuration customconf = new Configuration ();
Customconf.setstrings ("Hbase.zookeeper.quorum",
"Node0,node1,node2");
Increased RPC communication duration
Customconf.setlong ("Hbase.rpc.timeout", 600000);
Set Scan cache
Customconf.setlong ("hbase.client.scanner.caching", +);
Configuration configuration = Hbaseconfiguration.create (customconf);
Aggregationclient aggregationclient = new Aggregationclient (
configuration);
Scan scan = new scan ();
Specifies the Scan column family, unique value
scan.addfamily (CF);
Long RowCount = AGGREGATIONCLIENT.ROWCOUNT (table_name, NULL, scan);
System.out.println ("Row count is" + RowCount);
}
}
The following is a supplement to the Observer Program sample:
4. Programming Practices (Code Example)4.3 Regionobserverexample (This example from the Hbase:the Definitive Guide)//The newly implemented class must inherit the Baseregionobserver class
Package Hbasecoprocessor;
import java.io.IOException;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import Org.apache.hadoop.hbase.KeyValue;
import Org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import Org.apache.hadoop.hbase.client.Result;
import Org.apache.hadoop.hbase.coprocessor.BaseRegionObserver;
import Org.apache.hadoop.hbase.coprocessor.ObserverContext;
import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
import org.apache.hadoop.hbase.util.Bytes;
Public class Regionobserverexample extends
Baseregionobserver {
Public static Final byte[] Fixed_row =
bytes.tobytes ("@@ zzfcthotfixz @@@");
Public static String tablename = "table";
Public static String Rowkey = "Rowkey";
@Override
Public void Preget (
final observercontext<regioncoprocessorenvironment> E,
final get get, final list<keyvalue> results) throws
IOException {
//if (Bytes.equals (Get.getrow (), Fixed_row)) {//The original function in the book is if the query ROW is fixed_row, the result returns the system timeKeyValue kv = new KeyValue (Get.getrow (), Fixed_row,
Fixed_row,
bytes.tobytes (System.currenttimemillis ()));
Results.add (KV);
//}
}
Public static void SelectRow (String tablename, string RowKey)
throws IOException {
Configuration config = hbaseconfiguration.create ();
htable table =new htable (config, tablename);
get G =new get (Rowkey.getbytes ());
Result rs = Table.get (g);
For (KeyValue Kv:rs.raw ()) {
System.out.print (New String (Kv.getrow ()) + "");
System.out.print (New String (kv.getfamily ()) + ":");
System.out.print (New String (Kv.getqualifier ()) + "");
System.out.println (New String (Kv.getvalue ()));
}