Overview
HBase is a Key-value database based on Hadoop, which provides efficient random read and write services on HDFS data, perfectly fills the pitfalls of Hadoop MapReduce only for batch processing and is being used by more and more users. As an important feature of HBase, Coprocessor was added to the HBase 0.92 version and was widely popular
With the coprocessor, the user can write code that runs on the HBase server side. HBase supports two types of coprocessor, Endpoint and Observer. The Endpoint coprocessor is similar to a stored procedure in a traditional database where clients can invoke these Endpoint coprocessor to execute a section of server-side code and return the results of the server-side code to the client for further processing, the most common use being aggregation operations. Without a coprocessor, when the user needs to find the maximum data in a table, the Max aggregation operation, a full table scan is required, traversing the scan results within the client code, and performing the maximum operation. Such a method cannot take advantage of the concurrency capability of the underlying cluster, and it is inefficient to centralize all computations to the client side for unified execution. With coprocessor, the user can deploy the maximum code to the HBase server, HBase will take advantage of multiple nodes at the bottom cluster to perform the maximum operation concurrently. That is, the maximum code is executed within each Region range, and the maximum value of each Region is computed on the Region Server side, and only the max value is returned to the client. The maximum value is found in further processing of the maximum number of Region by the client. This will improve the overall efficiency of the implementation of a lot.
Another coprocessor is called Observer coprocessor, a coprocessor similar to a trigger in a traditional database, which is invoked by the Server side when certain events occur. Observer coprocessor is a hook hook that is scattered around the HBase Server-side code and is invoked when a fixed event occurs. For example, there is a hook function preput before the put operation, which is called by Region Server before the put operation is executed, and a postput hook function after the put operation. development Environment maven-3.3.9 JDK 1.7 cdh-hbase-1.2.0 myeclipse hbase coprocessor loading
Enter HBase command line
# HBase Shell
HBase (Main):> disable ' test '
hbase (main):> alter ' test ', CONFIGURATION => {' Hbase.table.sanity.checks ' = > ' false '} //-----Create a table, execute once on the line
hbase (main):> alter ' test ', ' coprocessor ' => ' hdfs:///code/jars/ regionobserver-put5.jar|com.hbase.observer.app|1001 ' //----load jar package
hbase (main):> alter ' test ', method = > ' Table_att_unset ', NAME => ' coprocessor$1 ' //--------Uninstall Jar Pack
hbase (main):> desc ' test ' //-- -----"View the table's property Description
hbase (main):> enable ' test '
Complete Engineering Code
Package com.hbase.observer;
/** * HBase Level Two index * @author wing * @createTime 2017-4-7/import java.io.IOException;
Import java.util.ArrayList;
Import java.util.List;
Import Org.apache.hadoop.hbase.Cell;
Import org.apache.hadoop.hbase.CoprocessorEnvironment;
Import org.apache.hadoop.hbase.client.Durability;
Import Org.apache.hadoop.hbase.client.Get;
Import Org.apache.hadoop.hbase.client.HTableInterface;
Import Org.apache.hadoop.hbase.client.HTablePool;
Import Org.apache.hadoop.hbase.client.Put;
Import Org.apache.hadoop.hbase.client.Result;
Import Org.apache.hadoop.hbase.coprocessor.BaseRegionObserver;
Import Org.apache.hadoop.hbase.coprocessor.ObserverContext;
Import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
Import Org.apache.hadoop.hbase.regionserver.InternalScanner;
Import Org.apache.hadoop.hbase.regionserver.wal.WALEdit;
Import org.apache.hadoop.hbase.util.Bytes;
public class App extends Baseregionobserver {private Htablepool pool = null; PriVate final static String source_table = "Test"; @Override public void Start (Coprocessorenvironment env) throws IOException {pool = new Htablepool (env.getconf
Iguration (), 10); @Override public void Postgetop (observercontext<regioncoprocessorenvironment> c, get get, Lis
T<cell> results) throws IOException {htableinterface table = pool.gettable (Bytes.tobytes (source_table));
String Newrowkey = bytes.tostring (Get.getrow ());
String pre = newrowkey.substring (0, 1);
if (Pre.equals ("T")) {string[] splits = Newrowkey.split ("_");
String Prepre = splits[0].substring (1, 3);
String timestamp = splits[0].substring (3);
String uid = splits[1];
String mid = "";
for (int i = 2; i < Splits.length i++) {mid + = Splits[i];
Mid + = "_";
Mid = mid.substring (0, Mid.length ()-1); String Rowkey = prepre + uid + "_" + Timestamp + "_" + mid;
System.out.println (Rowkey);
Get realget = new Get (Rowkey.getbytes ());
Result result = Table.get (realget);
List<cell> cells = Result.listcells ();
Results.clear ();
for (cell cell:cells) {results.add (cell);
@Override public void Postput (Observercontext<regioncoprocessorenvironment> e, Put put, waledit edit, durability durability) throws IOException {try {String Rowkey = Bytes.tos
Tring (Put.getrow ());
Htableinterface table = pool.gettable (Bytes.tobytes (source_table));
String pre = rowkey.substring (0, 2); if (pre.equals ("AA") | | pre.equals ("AB") | | | pre.equals ("AC") | | pre.equals ("BA") | | pre.equals ("BB") | | Pre.equals ("BC") | | Pre.equals ("Ca") | | Pre.equals ("CB") | | Pre.equals ("CC ")) {string[] splits = Rowkey.split (" _ ");
String uid = splits[0].substring (2);
String timestamp = splits[1];
String mid = "";
for (int i = 2; i < Splits.length i++) {mid + = Splits[i];
Mid + = "_";
Mid = mid.substring (0, Mid.length ()-1);
String Newrowkey = "T" + Pre + timestamp + "_" + uid + "_" + mid;
System.out.println (Newrowkey);
Put indexput2 = new put (newrowkey.getbytes ());
Indexput2.addcolumn ("Relation". GetBytes (), "Column10". GetBytes (), "GetBytes" ());
Table.put (INDEXPUT2);
} table.close (); The catch (Exception ex) {}} @Override public boolean postscannernext (observercontext& Lt Regioncoprocessorenvironment> E, InternalscaNner s, list<result> results, int limit, Boolean hasmore) throws IOException {Htabl
Einterface table = pool.gettable (Bytes.tobytes (source_table));
list<result> newresults = new arraylist<result> ();
for (result result:results) {String Newrowkey = bytes.tostring (Result.getrow ());
String pre = newrowkey.substring (0, 1);
if (Pre.equals ("T")) {string[] splits = Newrowkey.split ("_");
String Prepre = splits[0].substring (1, 3);
String timestamp = splits[0].substring (3);
String uid = splits[1];
String mid = "";
for (int i = 2; i < Splits.length i++) {mid + = Splits[i];
Mid + = "_";
Mid = mid.substring (0, Mid.length ()-1);
String Rowkey = prepre + uid + "_" + Timestamp + "_" + mid; GetRealget = new Get (Rowkey.getbytes ());
Result Newresult = Table.get (realget);
Newresults.add (Newresult);
} results.clear ();
For [result Result:newresults] {results.add (result);
return hasmore;
@Override public void Stop (Coprocessorenvironment env) throws IOException {pool.close ();
}
}
After the MAVEN project is packaged, it is uploaded to the HDFs directory and the jar package is loaded by command.
You can complete the level two index. When the user put the operation, the original Rowkey is converted to the new Rowkey, and then an index is saved. When a user gets operation, the Rowkey is mapped to the actual rowkey, and the actual results are obtained based on the actual rowkey. When the user performs a scanner operation, the result of the scanner is mapped to the result of the actual rowkey and returned to the user.
With the HBase baseregionobserver coprocessor, many hbase operations can be encapsulated.
Baseregionobserver Java Interface (note hbase version)
Https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.html