HBase-related API Drills (ii): Java API

Source: Internet
Author: User
Tags zookeeper

One, HBase Java programming

(1) HBase is written in the Java language, and it supports Java programming;

(2) HBase supports CRUD operations: create,read,update and delete;

(3) The Java API contains all the features that hbase shell supports, even more;

(4) The Java API is the quickest way to access hbase.

Second, HBase Java programming--Programming steps

First step: Create a Configuration object

Configuration conf = hbaseconfiguration.create ();

1) Configuration object contains a variety of information

Step two: Build a htable handle

htable table = new htable (conf, tableName);

1) Provide the configuration object;

2) Provide the name of the table you want to access.

Step three: Perform the action

Table.gettablename ();

1) Perform actions such as put, get, delete, scan, etc.

Fourth step: Close htable handle

Table.close ();

1) Flush memory data to disk

2) Release of resources

Iii. hbase Interaction

There are many different ways to interact with HBase clusters, such as Java APIs, REST, Thrift, and so on. Let's use the Java API as an example to describe their usage.

Java API Interactive

HBase, like Hadoop, is written in Java, so hbase is required for Java support, and here's how to use the Java language to manipulate HBase. The Java API core classes are described below.

1 , Hbaseconfiguration class

Hbaseconfiguration is the object that every HBase Client uses, and it represents the HBase configuration information. There are two ways of constructing it.

 Public hbaseconfiguration ()  Public Hbaseconfiguration (final Configuration C)

The default constructor attempts to read the configuration from the Hbase-default.xml and hbase-site.xml files. If Classpath does not have these two files, it needs to be configured by itself.

Configuration Hbase_config =NewConfiguration (); Hbase_config.Set("HBase. Zookeeper.quorum","Zkserver");//hbase Service AddressHbase_config.Set("HBase. ZooKeeper.property.clientPort","2181");//Port numberHbaseconfiguration cfg =NewHbaseconfiguration (Hbase_config);//reading configuration Files

2 , Create a table

Creates a table through an Hbaseadmin object operation. Hbaseadmin is responsible for the processing of meta-table information. Hbaseadmin provides the CreateTable method.

 Public void createtable (htabledescriptor desc)

Htabledescriptor represents the table Schema, which provides the following two common methods.

1) Setmaxfilesize: Specifies the size of the largest region.

2) Setmemstoreflushsize: Specifies the size of the file Memstore Flush to HDFS.

3 , Increase Family

Use the Addfamily method to implement Family additions.

 Public void addfamily (final Hcolumndescriptor family)

Hcolumndescriptor represents the Schema of Column, which provides the following common methods.

1, settimetolive: Specify the maximum TTL (in ms), the expired data will be automatically deleted.

2, Setinmemory: Specify whether to put in memory, the small table is useful, can be used to improve efficiency. Closed by default.

3, Setbloomfilter: Specify whether to use Bloomfilter, can improve the efficiency of random query. Closed by default.

4. Setcompressiontype: Sets the type of data compression. The default is no compression.

5, Setmaxversions: Specify the maximum number of versions of the data saved. The default is 3.

For a simple example, create 4 Family tables with the following command.

Hbaseadmin hadmin =Newhbaseadmin (hbaseconfig); Htabledescriptor Table=NewHtabledescriptor (tableName); Table.addfamily (NewHcolumndescriptor ("F1")); Table.addfamily (NewHcolumndescriptor ("F2")); Table.addfamily (NewHcolumndescriptor ("f3")); Table.addfamily (NewHcolumndescriptor ("f4") ); hadmin.createtable (table) ;

4 , Delete a table

Deleting a table is also done through hbaseadmin, and the table must be disable before the table is deleted. This is a very time-consuming operation, so it is not recommended to delete tables frequently.

Disabletable and deletetable are used to perform disable and delete operations, respectively. Use the following method.

New hbaseadmin (hbaseconfig); if (Hadmin.tableexists (tableName)) {        hadmin.disabletable (tableName);        Hadmin.deletetable (tableName);}

5 , query data

queries are divided into a single random query and a bulk query. A single query queries a row's data through row Key in a table, Htable provides a GET method to complete a single query. Bulk query through the development of a range of Row Key to query, Htable provides the Getscanner method to complete the batch query.

 Public Get Get ) public resultscanner getscanner (final scan scan)

The Get object contains the information required for a get query, and it is constructed in two ways.

 Public Get (byte  [] row) public Get (byte [] Row,rowlock Rowlock)

Row lock to ensure the atomic nature of read and write, you can pass an already existing row lock, or HBase will automatically generate a new row lock.

  The Scan object provides a default constructor, typically using the default constructor

1) The common methods of Get and Scan are as follows.

Addfamily/addcolumn: Specifies the desired Family or column, and returns all of the column if no Family or column is called.

Setmaxversions: Specifies the maximum number of versions. If you call setmaxversions without any arguments, it means that all versions are taken. If you do not call Setmaxversions, only the latest version will be taken.

Settimerange: Specifies the maximum timestamp and minimum timestamp that can be obtained only by the Cell within this range.

Settimestamp: Specifies the timestamp.

SetFilter: Specifies that the filter does not require information.

2) Scan -specific methods are as follows.

Setstartrow: Specifies the starting line. If not called, start from the table header.

Setstoprow: Specifies the end of the line (not including this line).

Setbatch: Specifies the maximum number of cells to return. Prevents an OOM error from having too much data in a row.

3) Result represents a row of data. There are several common ways to do this.

GetRow: Returns the Row Key.

Raw: Returns all the KeyValue arrays.

GetValue: Gets the value of the Cell according to Column.

Resultscanner is a container for result, and each call to Resultscanner's next method returns result.

 Public Result Next () throws IOException;  Public Result [] Next (int nbrows) throws IOException;

The sample code is as follows.

New= table.getscanner (scan);  for (Result r:ss) {        System. out. println (new  String (R.getrow ()));          for (KeyValue kv:r.raw) {                 System. out. println (new  String (Kv.getcolumn ()));}        }

6 , Inserting Data

Htable inserts the data through the Put method, you can pass a single Put object or a List put object to implement individual inserts and bulk insertions, respectively.

 Public void put (final put put) throws IOException  Public void put (final list< put> puts) throws IOException

Put provides 3 ways to construct.

 Public Put (byte  [] row) public put (byte  [] row) public put (  BYTE  [] row,rowlock rowlock) Public put (put puttocopy)

Put There are several common ways to do this.

1) Add: Add a Cell.

2) Settimestamp: Specify all cell default timestamp, this value will be used if a cell does not specify timestamp. If not called, HBase will use the current time as the timestamp of the cell that does not specify timestamp.

3) Setwritetowal:wal is the abbreviation for write Ahead log, which refers to whether HBase writes log before inserting. The default is on, and turning off will improve performance, but if the system fails (the region Server that is responsible for inserting is hung up), the data may be lost.

another htable There are also two methods that affect the performance of an insert.

1) Setautoflash:autoflush refers to whether the Put operation is committed to HBase Server at each call to HBase. The default is true, each time it is committed. If this is a single insertion, there will be more I/O, which reduces its performance.

2) Setwritebuffersize:write Buffer Size plays a role when AutoFlush is False, the default is 2MB, that is, inserting data over 2MB, will be automatically submitted to the Server.

The sample code is as follows.

htable table =Newhtable (Hbaseconfig, tableName); Table.setautoflush (AutoFlush); List< put> LP =Newarraylist< put>();intCount =10000;byte[] buffer =New byte[1024x768]; Random R=NewRandom (); for(inti =1; I <= count;++i) {Put P=NewPut (String.Format ("row%09d", i). GetBytes ());               R.nextbytes (buffer); P.add ("F1". GetBytes (),NULL, buffer); P.add ("F2". GetBytes (),NULL, buffer); P.add ("f3". GetBytes (),NULL, buffer); P.add ("f4". GetBytes (),NULL, buffer);               P.setwritetowal (WAL);               Lp.add (P); if(i% +==0) {table.put (LP);               Lp.clear (); }}

7 , Delete Data

Htable Delete data through the Delete method.

 Public void Delete (final delete delete)

The Delete constructor method is as follows.

 Public Delete (byte  [] row) Public Delete (bytelong  timestamp, rowlock Rowlock) Public Delete (final delete d)

The delete common method is Deletefamily/deletecolumn, which specifies the data for the Family or Column to be deleted. If you do not call any of these methods, the entire row will be deleted.

Note: If a cell's timestamp is higher than the current time, the cell will not be deleted and can still be traced.

The sample code is as follows.

New Htable (Hbaseconfig,"mytest"new Delete ("row1"  . GetBytes ()); Table.delete (d)

8 , Segmentation Table

Hbaseadmin provides the split method to slice the table.

 Public void Split (final String Tablenameorregionname)

If TableName is provided, all region of the table will be sliced, and if regionname is provided, only the region will be sliced. Split is an asynchronous operation, so it doesn't exactly control the number of region.

The sample code is as follows.

 Public voidSplit (String TableName,intNumberinttimeout) throws exception{Configuration Hbase_config=NewConfiguration (); Hbase_config.Set("HBase. Zookeeper.quorum", Globalconf.zookeeper_quorum); Hbaseconfiguration CFG=Newhbaseconfiguration (hbase_config); Hbaseadmin Hadmin=Newhbaseadmin (CFG); Htable htable=Newhtable (cfg,tablename); intOldsize =0; LongTime =System.currenttimemillis ();  while(true){                 intSize =htable.getregionsinfo (). Size (); Logger.info ("The region number="+size); if(Size>=number) Break; if(Size! =oldsize)                         {Hadmin.split (Htable.gettablename ()); Oldsize=size; }Else if(System.currenttimemillis ()-time>Timeout) {                          Break; } thread.sleep ( +*Ten); }}

The above is the main content of this section Bo master for everyone, this is the master of his own learning process, hope to give you a certain guidance role, useful also hope that we point a support, if you do not use also hope to forgive, there are mistakes please point out. If there is hope to pay attention to bloggers to get updates the first time Oh, thank you!

HBase-related API Drills (ii): Java API

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.