Preface newbie Just contact HBase, encounter several beginners common about hbase problem, here on the problem is not much to do introduction, mainly or coding optimization problems, the following special introduction of the following points, I hope for beginners help.
Tipsrowkey design HBase Regardless of what operations are performed on the Rowkey scan operation, Rowkey sorting is sorted by dictionary order. IO Consider the design of the line for reading optimization, as far as possible to the line in accordance with the sequence number close together, reduce scanning. The main consideration for write optimization is that all rowkey do not write to the same region, which can cause other machines to be idle, and only the throughput of this region is the throughput of your application, such as timestamps, which can occur when doing health. The solution is mainly two:
- Hash, build the table time table, and then write using the hash function, will be the row to beat the distribution, this is only for the write operation of the optimization, for reading, may be a disaster (full table scan)
- Salting, using random values to prefix, solve the problem of full table scan
Merge the same rowkey because HBase is a thread for every command (put,delete,get,increment), each command has a basic addfamily (), and the Addcloumns () method can use the
Do you need to do htable connection cache management This is not required, hbase itself has the cache mechanism, mainly in the Hconnectionmanager management, it will do cache. Each close will also check whether the cacheinstance also have reference, if there is a temporary not close, not on the close,reference-1. Detailed please see Hconnectionmanager source;
public static Hconnection getconnection (final Configuration conf) throws IOException { Hconnectionkey Connectionkey = new Hconnectionkey (conf); Synchronized (connection_instances) { hconnectionimplementation CONNECTION = Connection_instances.get ( Connectionkey); if (connection = = null) { connection = (hconnectionimplementation) createconnection (conf, true); Connection_instances.put (Connectionkey, CONNECTION); } else if (connection.isclosed ()) { hconnectionmanager.deleteconnection (Connectionkey, true); Connection = (hconnectionimplementation) createconnection (conf, true); Connection_instances.put (Connectionkey, CONNECTION); } Connection.inccount (); return connection; } }
Code I implemented a simple hbase Client, not thread-safe, on GitHub, can be referenced under clone, for reference only (for testing, there may be errors)
Simple-hbase-client:git Clone Https://github.com/zhgwen/simple-hbase-client.git
Simple HBase Client Side implementation