1. Use "Connection pooling"
If you're going to create a new connectionevery time you interact with hbase, it's obviously inefficient, andHBase also provides a class connection pool-related API.
1.1. Htablepool
It was used in the early API , but unfortunately it is now obsolete. No longer described at times.
1.2. hconnection
Instead of Htablepool is now the hconnection, which can be used to get almost all the related operands about hbase.
Private StaticHconnection connection =NULL;Private StaticConfiguration conf =NULL;Static{ Try{conf=hbaseconfiguration.create (); Conf.Set("Hbase.zookeeper.property.clientPort","2181"); Conf.Set("Hbase.zookeeper.quorum","HADOOP-MASTER01,HADOOP-SLAVE01,HADOOP-SLAVE02"); Connection=hconnectionmanager.createconnection (Gethbaseconfiguration ()); } Catch(zookeeperconnectionexception e) {e.printstacktrace (); }}
2. Read Optimization 2.1. According to Rowkey
If there is only one rowkey in this operation , large can use the following way (single read):
byte New byte getnew= desttable. Get (get);
If you have more than one rowkey , you can use the following method (bulk Read):
list<bytenew arraylist<byte[]>(); ListNew arraylist<get>(); for (byte[] row:rowlist) {gets.add (new= desttable. Get(gets);
2.2. Use scan
New= srctable.getscanner (scan);
You can set the number of data bars that Resultscanner fetch from the server at a time by setting the hbase.client.scanner.caching parameter. The default is one at a time, which can greatly increase the efficiency of the cursor movement of the result set (Resultscanner.next ()).
There are three ways to set this parameter:
- the conf configuration file for HBase can be configured in hdfs-site.xml
- Table of objects:htable.setscannercaching (10000);
- Face Sweeper object:scan.setcaching (10000);
In addition, you can also pass:
Scan.addcolumn (Bytes.tobytes ("SM"), Bytes.tobytes ("IP"));
Set the scanned columns, reduce unnecessary network traffic, and improve the efficiency of reading tables.
3. Write optimization
Write data in the operation of each commit a put, which contains the Rowkey, and for one or more columns of values.
3.1. Write a single piece of data
byte [] row =new Put (Row);p Ut.add (Bytes.tobytes (...), bytes.tobytes (...), bytes.tobytes (...)); Table.put (Put); Table.flushcommits ();
where Table.put (Put) submits data to HDFs and executes Table.flushcommits () , the data is submitted to HBase .
3.2. Write more than one piece of data
When writing multiple data, it involves data submission and caching issues, as follows:
Use Htable.setautoflush (True) to set the cache to be automatically maintained when the client writes data, which is turned on by default when the data reaches the cache limit. When you set up a client to maintain its own cache, you have more requirements to set the size of the cache,htable.setwritebuffersize (writebuffersize).
But in practical development, this method is not advocated. The reason is that the time spent in Table.put (put) to connect to HDFs is frequent and not suitable for bulk writes of large throughput.
- Manually maintaining the cache
You can put the data you want to write into local memory, and then use Table.put (list<put>) to submit the data. This reduces the number of interactions between the client and the cluster and increases the throughput of the transfer.
List<put> puts =NewArraylist<put>(); for(intI=0; i<100000; i++){ byte[] Rowkey = Bytes.tobytes (Randomstringutils.random (8,"abcdesssss")); byte[] value = Bytes.tobytes (Randomstringutils.random (Ten,"Iojkjhhjnnbghikklm<nh")); Put put=NewPut (Rowkey); Put.add (Bytes.tobytes (FAMILY_CF), Bytes.tobytes ("value"), value); Puts.add (Put); if(i%10000==0) {table.put (puts); Table.flushcommits (); Puts.clear (); }}
3.3. Self-Added columns
Desttable.incrementcolumnvalue (Rowkey, Bytes.tobytes (FAMILY_CF), Bytes.tobytes ("testincrement "), Long.parselong ("1"),true);
towards testincrement The column is increased by 1. in a batch system, this method of use needs to be used with caution, it commits data each time it executes, and does not implement batch submissions for this column.
HBase Performance Optimization Java API