HBase Performance Optimization Java API

Last Update:2014-08-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Use "Connection pooling"

If you're going to create a new connectionevery time you interact with hbase, it's obviously inefficient, andHBase also provides a class connection pool-related API.

1.1. Htablepool

It was used in the early API , but unfortunately it is now obsolete. No longer described at times.

1.2. hconnection

Instead of Htablepool is now the hconnection, which can be used to get almost all the related operands about hbase.

Private StaticHconnection connection =NULL;Private StaticConfiguration conf =NULL;Static{    Try{conf=hbaseconfiguration.create (); Conf.Set("Hbase.zookeeper.property.clientPort","2181"); Conf.Set("Hbase.zookeeper.quorum","HADOOP-MASTER01,HADOOP-SLAVE01,HADOOP-SLAVE02"); Connection=hconnectionmanager.createconnection (Gethbaseconfiguration ()); } Catch(zookeeperconnectionexception e) {e.printstacktrace (); }}

2. Read Optimization 2.1. According to Rowkey

If there is only one rowkey in this operation , large can use the following way (single read):

byte New byte  getnew= desttable. Get (get);

If you have more than one rowkey , you can use the following method (bulk Read):

list<bytenew arraylist<byte[]>(); ListNew arraylist<get>();  for (byte[] row:rowlist) {gets.add (new= desttable.  Get(gets);

2.2. Use scan

New= srctable.getscanner (scan);

You can set the number of data bars that Resultscanner fetch from the server at a time by setting the hbase.client.scanner.caching parameter. The default is one at a time, which can greatly increase the efficiency of the cursor movement of the result set (Resultscanner.next ()).

There are three ways to set this parameter:

the conf configuration file for HBase can be configured in hdfs-site.xml
Table of objects:htable.setscannercaching (10000);
Face Sweeper object:scan.setcaching (10000);

In addition, you can also pass:

Scan.addcolumn (Bytes.tobytes ("SM"), Bytes.tobytes ("IP"));

Set the scanned columns, reduce unnecessary network traffic, and improve the efficiency of reading tables.

3. Write optimization

Write data in the operation of each commit a put, which contains the Rowkey, and for one or more columns of values.

3.1. Write a single piece of data

byte [] row =new  Put (Row);p Ut.add (Bytes.tobytes (...), bytes.tobytes (...), bytes.tobytes (...));        Table.put (Put); Table.flushcommits ();

where Table.put (Put) submits data to HDFs and executes Table.flushcommits () , the data is submitted to HBase .

3.2. Write more than one piece of data

When writing multiple data, it involves data submission and caching issues, as follows:

Client Maintenance Cache

Use Htable.setautoflush (True) to set the cache to be automatically maintained when the client writes data, which is turned on by default when the data reaches the cache limit. When you set up a client to maintain its own cache, you have more requirements to set the size of the cache,htable.setwritebuffersize (writebuffersize).

But in practical development, this method is not advocated. The reason is that the time spent in Table.put (put) to connect to HDFs is frequent and not suitable for bulk writes of large throughput.

Manually maintaining the cache

You can put the data you want to write into local memory, and then use Table.put (list<put>) to submit the data. This reduces the number of interactions between the client and the cluster and increases the throughput of the transfer.

List<put> puts =NewArraylist<put>(); for(intI=0; i<100000; i++){    byte[] Rowkey = Bytes.tobytes (Randomstringutils.random (8,"abcdesssss")); byte[] value = Bytes.tobytes (Randomstringutils.random (Ten,"Iojkjhhjnnbghikklm<nh")); Put put=NewPut (Rowkey); Put.add (Bytes.tobytes (FAMILY_CF), Bytes.tobytes ("value"), value);    Puts.add (Put); if(i%10000==0) {table.put (puts);        Table.flushcommits ();    Puts.clear (); }}

3.3. Self-Added columns

Desttable.incrementcolumnvalue (Rowkey, Bytes.tobytes (FAMILY_CF), Bytes.tobytes ("testincrement "), Long.parselong ("1"),true);

towards testincrement The column is increased by 1. in a batch system, this method of use needs to be used with caution, it commits data each time it executes, and does not implement batch submissions for this column.

HBase Performance Optimization Java API

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

HBase Performance Optimization Java API

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

HBase Performance Optimization Java API

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support