HBase Performance Optimization Java API

Source: Internet
Author: User

1. Use "Connection pooling"

If you're going to create a new connectionevery time you interact with hbase, it's obviously inefficient, andHBase also provides a class connection pool-related API.

1.1. Htablepool

It was used in the early API , but unfortunately it is now obsolete. No longer described at times.

1.2. hconnection

Instead of Htablepool is now the hconnection, which can be used to get almost all the related operands about hbase.

Private StaticHconnection connection =NULL;Private StaticConfiguration conf =NULL;Static{    Try{conf=hbaseconfiguration.create (); Conf.Set("Hbase.zookeeper.property.clientPort","2181"); Conf.Set("Hbase.zookeeper.quorum","HADOOP-MASTER01,HADOOP-SLAVE01,HADOOP-SLAVE02"); Connection=hconnectionmanager.createconnection (Gethbaseconfiguration ()); } Catch(zookeeperconnectionexception e) {e.printstacktrace (); }}
2. Read Optimization 2.1. According to Rowkey

If there is only one rowkey in this operation , large can use the following way (single read):

byte New byte  getnew= desttable. Get (get);

If you have more than one rowkey , you can use the following method (bulk Read):

list<bytenew arraylist<byte[]>(); ListNew arraylist<get>();  for (byte[] row:rowlist) {gets.add (new= desttable.  Get(gets);
2.2. Use scan
New= srctable.getscanner (scan);

You can set the number of data bars that Resultscanner fetch from the server at a time by setting the hbase.client.scanner.caching parameter. The default is one at a time, which can greatly increase the efficiency of the cursor movement of the result set (Resultscanner.next ()).

There are three ways to set this parameter:

    • the conf configuration file for HBase can be configured in hdfs-site.xml
    • Table of objects:htable.setscannercaching (10000);
    • Face Sweeper object:scan.setcaching (10000);

In addition, you can also pass:

Scan.addcolumn (Bytes.tobytes ("SM"), Bytes.tobytes ("IP"));

Set the scanned columns, reduce unnecessary network traffic, and improve the efficiency of reading tables.

3. Write optimization

Write data in the operation of each commit a put, which contains the Rowkey, and for one or more columns of values.

3.1. Write a single piece of data
byte [] row =new  Put (Row);p Ut.add (Bytes.tobytes (...), bytes.tobytes (...), bytes.tobytes (...));        Table.put (Put); Table.flushcommits ();

where Table.put (Put) submits data to HDFs and executes Table.flushcommits () , the data is submitted to HBase .

3.2. Write more than one piece of data

When writing multiple data, it involves data submission and caching issues, as follows:

    • Client Maintenance Cache

Use Htable.setautoflush (True) to set the cache to be automatically maintained when the client writes data, which is turned on by default when the data reaches the cache limit. When you set up a client to maintain its own cache, you have more requirements to set the size of the cache,htable.setwritebuffersize (writebuffersize).

But in practical development, this method is not advocated. The reason is that the time spent in Table.put (put) to connect to HDFs is frequent and not suitable for bulk writes of large throughput.

    • Manually maintaining the cache

You can put the data you want to write into local memory, and then use Table.put (list<put>) to submit the data. This reduces the number of interactions between the client and the cluster and increases the throughput of the transfer.

List<put> puts =NewArraylist<put>(); for(intI=0; i<100000; i++){    byte[] Rowkey = Bytes.tobytes (Randomstringutils.random (8,"abcdesssss")); byte[] value = Bytes.tobytes (Randomstringutils.random (Ten,"Iojkjhhjnnbghikklm<nh")); Put put=NewPut (Rowkey); Put.add (Bytes.tobytes (FAMILY_CF), Bytes.tobytes ("value"), value);    Puts.add (Put); if(i%10000==0) {table.put (puts);        Table.flushcommits ();    Puts.clear (); }}
3.3. Self-Added columns
Desttable.incrementcolumnvalue (Rowkey, Bytes.tobytes (FAMILY_CF), Bytes.tobytes ("testincrement "), Long.parselong ("1"),true);

towards testincrement The column is increased by 1. in a batch system, this method of use needs to be used with caution, it commits data each time it executes, and does not implement batch submissions for this column.

HBase Performance Optimization Java API

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.