HBase notes: A simple understanding of the principles of HBase

Source: Internet
Author: User

Earlier learning about Hadoop, I was puzzled by two techniques, one zookeeper and hbase. Now has the opportunity to become full-time big data related projects, finally saw the hbase real-time project, and therefore have the opportunity to understand the hbase principle.

First of all, if we have deployed the HBase application on the server, as the client or the specific point, how does the local development environment write the program and the service-side hbase to interact?

I'll show you the following, first look at the structure diagram of the project, as shown in:

  Next we will import all the jar packages in the HBase application Lib folder into the project Lib directory. Also to the Conf directory under the Hbase-site.xml download down in the Conf directory, where I also put the HBase project log4j.properties files in the root directory of the project, so that when we run the program, the console print log will be more detailed, the following is Hbases Source code for Tudy.java:

Package Cn.com.hbasetest;import Java.io.ioexception;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.hbase.hbaseconfiguration;import Org.apache.hadoop.hbase.hcolumndescriptor;import Org.apache.hadoop.hbase.htabledescriptor;import Org.apache.hadoop.hbase.masternotrunningexception;import Org.apache.hadoop.hbase.tablename;import Org.apache.hadoop.hbase.zookeeperconnectionexception;import Org.apache.hadoop.hbase.client.get;import Org.apache.hadoop.hbase.client.hbaseadmin;import Org.apache.hadoop.hbase.client.htable;import Org.apache.hadoop.hbase.client.put;import Org.apache.hadoop.hbase.client.result;import Org.apache.hadoop.hbase.client.resultscanner;import Org.apache.hadoop.hbase.client.scan;import Org.apache.hadoop.hbase.util.bytes;import Org.slf4j.Logger;import Org.slf4j.loggerfactory;public class Hbasestudy {public final static Logger Logger = Loggerfactory.getlogger ( Hbasestudy.class);/* build configuration, here is the Hbase-site.xml parsed object, here I also specify the way to read the file locally */static ConfiguratIon hbaseconf = Hbaseconfiguration.create (); static {Hbaseconf.addresource ("Conf/hbase-site.xml");} /** * Insert Data * @throws ioexception */public void Puttabledata () throws IOException {htable tbl = new Htable (hbaseconf, "Xsha Rptable001 "); Put put = new put (Bytes.tobytes ("xrow01"));p Ut.add (bytes.tobytes ("Xcolfam01"), Bytes.tobytes ("Xcol01"), Bytes.tobytes ("Xvalue01"));p Ut.addcolumn (bytes.tobytes ("Xcolfam01"), Bytes.tobytes ("Xcol02"), Bytes.tobytes (" Xvalue02 "));p ut.addimmutable (bytes.tobytes (" Xcolfam01 "), Bytes.tobytes (" Xcol03 "), Bytes.tobytes (" xvalue03 ")); Tbl.put (put);} /** * Insert multiple rows of data * @throws ioexception */public void Puttabledatarow () throws IOException {htable tbl = new Htable (hbaseconf, "xsharptable001"); Put put = new put (Bytes.tobytes ("xrow02"));p Ut.add (bytes.tobytes ("Xcolfam01"), Bytes.tobytes ("Xcol01"), Bytes.tobytes ("xvalue012"));p Ut.addcolumn (bytes.tobytes ("Xcolfam01"), Bytes.tobytes ("Xcol02"), Bytes.tobytes (" xvalue022 "));p ut.addimmutable (bytes.tobytes (" Xcolfam01 "), Bytes.tobytes (" XcolBytes.tobytes ("xvalue032")), Tbl.put (Put);p ut = new put (Bytes.tobytes ("xrow03"));p Ut.add (Bytes.tobytes (" Xcolfam01 "), Bytes.tobytes (" Xcol01 "), Bytes.tobytes (" xvalue0213 "));p Ut.addcolumn (bytes.tobytes (" xcolfam01 "), Bytes.tobytes ("Xcol02"), Bytes.tobytes ("xvalue0123"));p ut.addimmutable (bytes.tobytes ("Xcolfam01"), Bytes.tobytes ("Xcol03"), Bytes.tobytes ("xvalue0223")), Tbl.put (Put);p ut = new put (Bytes.tobytes ("xrow04"));p Ut.add (bytes.tobytes ("Xcolfam01"), Bytes.tobytes ("Xcol01"), Bytes.tobytes ("xvalue0334"));p Ut.addcolumn (bytes.tobytes ("xcolfam01"), Bytes.tobytes ("Xcol02"), Bytes.tobytes ("xvalue0224"));p ut.addimmutable (bytes.tobytes ("Xcolfam01"), Bytes.tobytes ("Xcol03"), Bytes.tobytes ("xvalue0334"));p ut.addimmutable (bytes.tobytes ("Xcolfam01"), Bytes.tobytes ("xcol04"), Bytes.tobytes ("xvalue0334")); Tbl.put (put);} /** * Query data in hbase table * @throws ioexception */public void Gettabledata () throws IOException {htable table = new Htable (hbasec onf, "xsharptable001"); Get get = new Get (Bytes.tobytes ("xRow01 ")); Get.addfamily (Bytes.tobytes (" xcolfam01 ")); Result result = Table.get (get); byte[] bs = Result.getvalue (Bytes.tobytes ("Xcolfam01"), Bytes.tobytes ("xcol02"));/= = = = ========= Query Result: Xvalue02logger.info ("============ query Result:" + bytes.tostring (BS));} /** * Create hbase Table * * @throws masternotrunningexception * @throws zookeeperconnectionexception * @throws IOException */publ IC void CreateTable () throws Masternotrunningexception, Zookeeperconnectionexception, ioexception {HBaseAdmin admin = New Hbaseadmin (hbaseconf), if (Admin.tableexists (Bytes.tobytes ("xsharptable001")) {Logger.info ("===============: Table already exists!failure! ");} else {TableName TableName = tablename.valueof (Bytes.tobytes ("xsharptable001")); Htabledescriptor Tabledesc = new Htabledescriptor (tableName); Hcolumndescriptor Hcol = new Hcolumndescriptor (bytes.tobytes ("xcolfam01")); tabledesc.addfamily (Hcol); Admin.createtable (TABLEDESC); Logger.info ("==============: Table created successfully! Success! ");}} /** * Scan data via scan, cursor equivalent to relational data * * @throws ioexception */public voiD Scantabledata () throws IOException {htable tbl = new Htable (hbaseconf, "xsharptable001"); Scan scanAll = new scan ();  Resultscanner Scannerall = Tbl.getscanner (SCANALL); for (result resall:scannerall) {/* * * Results printed: 2016-06-14 15:46:10,723 INFO [main] * hbasetest. Hbasestudy: ======scanall *: Keyvalues={xrow01/xcolfam01:xcol01/1465885252556/put */vlen=8/seqid=0, * xrow01/ Xcolfam01:xcol02/1465885252556/put/vlen=8/seqid=0, * xrow01/xcolfam01:xcol03/1465885252556/put/vlen=8/seqid=0} * 2016-06-14 15:46:10,723 INFO [main] hbasetest. Hbasestudy: * ======scanall *: Keyvalues={xrow02/xcolfam01:xcol01/1465887392414/put */vlen=9/seqid=0, * xrow02/ xcolfam01:xcol02/1465887392414/put/vlen=9/seqid=0} * 2016-06-14 15:46:10,723 INFO [main] hbasetest. Hbasestudy: * ======scanall *: Keyvalues={xrow03/xcolfam01:xcol01/1465887392428/put */vlen=10/seqid=0, * xrow03/ Xcolfam01:xcol02/1465887392428/put/vlen=10/seqid=0, * xrow03/xcolfam01:xcol03/1465887392428/put/vlen=10/seqid=0} * 2016-06-14 15:46:10,723 INFO [Main] Hbasetest. Hbasestudy: * ======scanall *: Keyvalues={xrow04/xcolfam01:xcol01/1465887392432/put */vlen=10/seqid=0, * xrow04/ Xcolfam01:xcol02/1465887392432/put/vlen=10/seqid=0, * xrow04/xcolfam01:xcol03/1465887392432/put/vlen=10/seqid=0, * Xrow04/xcolfam01:xcol04/1465887392432/put/vlen=10/seqid=0} */logger.info ("======scanall:" + resAll);} Scannerall.close (); Scan scancolfam = new scan (); Scancolfam.addfamily (Bytes.tobytes ("xcolfam01")); Resultscanner Scannercolfam = Tbl.getscanner (SCANCOLFAM); for (Result rescolfam:scannercolfam) {/* * 2016-06-14 15:50:54 , 690 INFO [main] hbasetest. Hbasestudy: * ======SCANNERCOLFAM *: keyvalues={xrow01/xcolfam01:xcol01/1465885252556 */Put/vlen=8/seqid=0, * xrow01/ Xcolfam01:xcol02/1465885252556/put/vlen=8/seqid=0, * xrow01/xcolfam01:xcol03/1465885252556/put/vlen=8/seqid=0} * 2016-06-14 15:50:54,690 INFO [main] hbasetest. Hbasestudy: * ======SCANNERCOLFAM *: keyvalues={xrow02/xcolfam01:xcol01/1465887392414 */Put/vlen=9/seqid=0, * xrow02/ Xcolfam01:xcol02/1465887392414/put/vlen=9/seqid=0} * 2016-06-14 15:50:54,690 INFO [main] hbasetest. Hbasestudy: * ======SCANNERCOLFAM *: keyvalues={xrow03/xcolfam01:xcol01/1465887392428 */Put/vlen=10/seqid=0, * xrow03 /xcolfam01:xcol02/1465887392428/put/vlen=10/seqid=0, * xrow03/xcolfam01:xcol03/1465887392428/put/vlen=10/seqid=0 } * 2016-06-14 15:50:54,690 INFO [main] hbasetest. Hbasestudy: * ======SCANNERCOLFAM *: keyvalues={xrow04/xcolfam01:xcol01/1465887392432 */Put/vlen=10/seqid=0, * xrow04 /xcolfam01:xcol02/1465887392432/put/vlen=10/seqid=0, * xrow04/xcolfam01:xcol03/1465887392432/put/vlen=10/seqid=0 , * xrow04/xcolfam01:xcol04/1465887392432/put/vlen=10/seqid=0} */logger.info ("======scannercolfam:" + resColFam);} Scannercolfam.close (); Scan Scanrow = new Scan (), Scanrow.addcolumn (Bytes.tobytes ("Xcolfam01"), Bytes.tobytes ("Xcol02"). AddColumn ( Bytes.tobytes ("Xcolfam01"), Bytes.tobytes ("XCOL04")). Setstartrow (Bytes.tobytes ("xrow03")). Setstoprow ( Bytes.tobytes ("Xrow05")); Resultscanner Scannerrow = tbl.getsCanner (Scanrow); for (Result resrow:scannerrow) {/* * 2016-06-14 15:57:29,449 INFO [main] hbasetest. Hbasestudy: * ======scannerrow *: keyvalues={xrow03/xcolfam01:xcol02/1465887392428/* Put/vlen=10/seqid=0} 2016-06-14 15:57:29,449 INFO [main] * hbasetest. Hbasestudy: * ======SCANNERROW:KEYVALUES={XROW04/XCOLFAM01 *: xcol02/1465887392432/put/vlen=10/seqid=0, * xrow04/ Xcolfam01:xcol04/1465887392432/put/vlen=10/seqid=0} */logger.info ("======scannerrow:" + ResRow);} Scannerrow.close ();} public static void Main (string[] args) {hbasestudy HB = new Hbasestudy ();/*try {hb.createtable (); Hb.puttabledata ();} CATC H (Exception e) {e.printstacktrace ();} */try {//Hb.gettabledata ();//Hb.puttabledatarow (); Hb.scantabledata ();} catch (Exception e) {e.printstacktrace ();}}}

This code is written in a hurry, the example is not well designed, but the code I was tested, completely normal operation, the following stations are I through the HBase Shell query test results, as shown in:

Figure One:

Here I see the basic information of the table through the describe command.

Figure II:

Here I use the Scan command to perform a full table scan.

Might

Use the Scan command to perform a full table scan when I insert more data.

This article will not focus on hbase Javaapi, in fact, this example only uses a small number of APIs, but in my chosen API I would like to reflect the HBase table (table), row (rowkey), column family (family) and columns (column) relationship.

Create a table we need to define the table name, column family, insert the data first we insert the row, and then add the data according to the column Family definition column, which are all in accordance with the HBase design specification, here is the key: how we query the data.

For the get operation of the query, we construct the Get object to use Rowkey, scan can be full table scan, or according to the column family query, also can use the scope of the Rowkey scan scanning data range, regardless of the angle of the query, We can conclude that HBase query time will be related to rowkey and column family, HBase Javaapi and no more to provide more query means, so we can get to raise hbase query efficiency factors Rowkey and the family must assume a very important role.

Data in the era of big data is ultra-large, traditional relational database has been difficult to store and manage these data, in order to store huge amounts of data, we have HDFS, it can be thousands of servers on the hard disk aggregation into a super large hard disk, in order to make these data value, we have a mapreduce , it can calculate the data of this oversized hard drive, and in the face of such a large amount of data we have an urgent need to quickly retrieve the data we want, and this function is to be borne by hbase.

So what is the principle of the rapid retrieval of such large amounts of data? I think the principle is very simple is indexing technology. HBase uses Rowkey to differentiate different types of data, and to put together data that often needs to be queried together through a column family, for example, if we want to do a business table design for an e-commerce platform, for the merchant under the e-commerce platform, he just needs to check out his trading information. And do not care about other merchant's transaction information, then we can put the merchant number as Rowkey, each merchant's transaction information we put in a column family, the merchant number such information is like the data on the hard disk's house numbers, we pass this value to make the query, HBase can quickly find the location of the data storage, This is how hbase can quickly retrieve the data.

The above-mentioned principle is only the business abstract point of view, in the HBase bottom it is based on the above mentioned principles to design, HBase has a region concept, region is a data set, then what kind of data will be placed in a certain region? HBase is based on Rowkey to the same class of data placed in a region, Rowkey below is the column family, the corresponding underlying storage of the family is hfile,hfile placed in the rowkey corresponding region, So when we query time we easily through the business rules to find our design rowkey, found the Rowkey to find the region, then the region stored in the hfile column family information can be all queried out.

Rowkey is actually the index of hbase, it can be said that HBase is the official given the unique index, so many of the data that HBase has only one index, this level index refers to Rowkey, so how to design Rowkey is a brainiac, Often our line of data does not meet our complex query requirements, we need to cross the line like scan so many rows of data, and the region of the rows are arranged in a certain order, this order is the dictionary order, which I mentioned in a previous article, so encountered this situation, We typically hash the key through MD5 so that adjacent data rows are arranged together, and the underlying storage data is stored in the same place (the same region) or close to each other (adjacent region), which can also improve the efficiency of the query.

There are two tables in HBase, one is the-root-table and the. META. Table, and the client program is the example program I gave above first accesses zookeeper, obtains the region server name with-root-by zookeeper, and can query to by the-root-region server. ME TA. The table row key Rowkey corresponds to the region location, while the-root-and. META. Clients are cached after access.

In fact, hbase table design itself is very simple, the external interface is not related to the database so rich, I recently learned hbase, I feel that hbase basically does not have a relational database of those computational functions, it can be seen HBase just provides a quick way to retrieve large amounts of data of a computational model.

This is the end of this article, good memory is not as bad as writing, write the article is a summary of their own learning, but also to leave a memo and the future of the forgotten to do the struggle.

HBase notes: A simple understanding of the principles of HBase

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.