The path to learning the Hadoop Ecosystem (v) simple use of hbase

Source: Internet
Author: User
Tags bulk insert hadoop ecosystem

Recently, the company has been involved in the development of a large data interface platform, the specific process is this. Our company is responsible for the storage of data, that is, an ETL process, using Mr Data into hive, and then sync to Impala, and then this interface platform to provide query interface, the foreground will be the SQL statement with parameters, and then the interface platform through the call Impala provides Java API interface, The data is queried and returned to the user. In addition, if the amount of data queried is very large, then the front desk will pass a taskid, the first time only to query out the data, into the Impala temporary table, the next time to check the data back. So, how do you document the state change of this task, where we use HBase to TaskID as row key and create a column family record state information.
Below, HBase is described in the following steps.

First, the basic principles of HBase

HBase is a distributed Columnstore system built on HDFS and is primarily used for massive structured data storage.
Features of HBase:
1. Large, a table can have billions of rows, millions of columns;
2. No mode, each row has a sortable primary key and any number of columns, the column can be dynamically increased as needed, different rows in the same table can have distinct columns;
3. Column-oriented, column-oriented (family) storage and permission control, column (family) independent retrieval;
4. Sparse, empty (null) columns do not occupy storage space, the table can be designed very sparse;
5. Multiple versions of the data, the data in each unit can have multiple versions, by default, the version number is automatically assigned, is the time stamp when inserting cells;
6. Data types are single, and data in HBase is a string with no type.

Next, take a look at the HBase-related components:

Master: Allocate region for Region server, responsible for the load balancing of region server, discover the failed region server and reassign the region on it, and manage the user's adding and deleting the table.
Regionserver:regionserver maintains region, processing IO requests to these region, Regionserver is responsible for slicing the region that has become too large during operation.
Zookeeper: Through the election, ensure that at any time, only one master,master and regionservers in the cluster will register with the Zookeeper when it is started, store all the addressable portals of region, and monitor the region in real time. Server's on-line and offline information, and real-time notification to master, storing HBase schema and table metadata, by default, HBase manages zookeeper instances, such as starting or stopping zookeeper. The introduction of zookeeper makes master no longer a single point of failure.
About the introduction of the HBase table structure, I will introduce the following.

Ii. Common commands for HBase

First, we can execute the hbase shell into the hbase command line, as follows:

Then, execute list, you can see all the tables, as follows:
, next, we can see the table structure describe ' table name ', as follows:

As you can see, this table has a column family info.
We can then use the Scan ' table name ' to view the entire table's data.
Below, we use get ' result_info ', ' test02 ' to get all the column values of a row key in the table, as follows:

Well, say these several orders, there are many, we can look under, more practice and cooked.

Three, HBase Java API Basic operations

HBase package dependencies, as follows:

    <properties>      <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>      <hadoop.version>2.3.0-cdh5.0.0</hadoop.version>      <hbase.version>0.96.1.1-cdh5.0.0</hbase.version>      <hive.version>0.12.0-cdh5.0.0</hive.version>    </Properties>        <!--habase related jar-->        <dependency>          <groupId>Org.apache.hbase</groupId>          <artifactid>Hbase-client</artifactid>          <version>${hbase.version}</version>          <exclusions>              <exclusion>                  <artifactid>Jdk.tools</artifactid>                  <groupId>Jdk.tools</groupId>              </exclusion>          </exclusions>        </Dependency>        <dependency>            <groupId>Org.apache.hbase</groupId>            <artifactid>Hbase-common</artifactid>            <version>${hbase.version}</version>        </Dependency>        <dependency>            <groupId>Org.apache.hbase</groupId>            <artifactid>Hbase-server</artifactid>            <version>${hbase.version}</version>        </Dependency>        <dependency>            <groupId>Org.apache.hbase</groupId>            <artifactid>Hbase-thrift</artifactid>            <version>${hbase.version}</version>        </Dependency>        <dependency>            <groupId>Org.apache.hbase</groupId>            <artifactid>Hbase-testing-util</artifactid>            <version>${hbase.version}</version>            <scope>Test</Scope>        </Dependency>

First, I post the code directly, as follows:

 PackageOrg.hbase.demo;ImportJava.io.IOException;ImportOrg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.hbase.HBaseConfiguration;ImportOrg.apache.hadoop.hbase.client.Get;Importorg.apache.hadoop.hbase.client.HTable;ImportOrg.apache.hadoop.hbase.client.Put;ImportOrg.apache.hadoop.hbase.client.Result;ImportOrg.apache.hadoop.hbase.util.Bytes;/** * Key points 1_: The automatic commit is closed, if not closed, each write a data will commit, is the import data slow to do the main factor. * Key point 2: Set the cache size and HBase will automatically commit when the cache is larger than the set value. Here can try their own size, the general amount of large data, set to 5M, this article is set to 3M. * Key point 3: After each shard is flushcommits (), if not executed, when HBase's last cache is less than the above set value, no commits are committed, resulting in data loss. * * @author Qiyongkang * * * * Public  class Example {    /** * * Insertbatch: BULK INSERT. <br/> * * @author Qiyongkang * @throws ioexcep tion * @since JDK 1.6 */     Public Static void Insertbatch()throwsIOException {Configuration config = hbaseconfiguration.create (); Config.set ("Hbase.zookeeper.quorum","172.31.25.8,172.31.25.2,172.31.25.3"); Htable htable =Newhtable (config,"Qyk_info"); Htable.setautoflush (false,false);//Key point 1Htable.setwritebuffersize (3*1024x768*1024x768);//Key point 2        intnum =1; while(Num <=Ten) {Put put =NewPut (bytes.tobytes (num +"")); Put.add (Bytes.tobytes ("Info"), Bytes.tobytes ("Age"), Bytes.tobytes (" ")); Put.add (Bytes.tobytes ("Info"), Bytes.tobytes ("Name"), Bytes.tobytes ("Qyk"+ num)); Put.add (Bytes.tobytes ("Info"), Bytes.tobytes ("id"), bytes.tobytes (num +""));            Htable.put (Put); num++;if(num% -==0) {System.out.println ("..."+ num); }} htable.flushcommits ();//Key point 3Htable.close (); }/** * * Insertsingle: Single INSERT. <br/> * * @author Qiyongkang * @throws ioexcep tion * @since JDK 1.6 */     Public Static void Insertsingle()throwsIOException {Configuration config = hbaseconfiguration.create (); Config.set ("Hbase.zookeeper.quorum","172.31.25.8,172.31.25.2,172.31.25.3"); Htable htable =Newhtable (config,"Qyk_info"); Put put =NewPut (Bytes.tobytes ("0")); Put.add (Bytes.tobytes ("Info"), Bytes.tobytes ("Age"), Bytes.tobytes (" ")); Put.add (Bytes.tobytes ("Info"), Bytes.tobytes ("Name"), Bytes.tobytes ("Qyk"+0)); Put.add (Bytes.tobytes ("Info"), Bytes.tobytes ("id"), Bytes.tobytes ("0"));        Htable.put (Put);    Htable.close (); }/** * * getData: Gets column information based on row key. <br/> * * @author Qiyongkang * @throws IO Exception * @since JDK 1.6 */     Public Static void GetData()throwsIOException {Configuration config = hbaseconfiguration.create (); Config.set ("Hbase.zookeeper.quorum","172.31.25.8,172.31.25.2,172.31.25.3"); Htable htable =Newhtable (config,"Qyk_info"); Get get =NewGet (Bytes.tobytes ("1"));        Result result = Htable.get (get); String age = bytes.tostring (Result.getvalue (Bytes.tobytes ("Info"), Bytes.tobytes ("Age"))); String name = bytes.tostring (Result.getvalue (Bytes.tobytes ("Info"), Bytes.tobytes ("Name"))); String id = bytes.tostring (result.getvalue (Bytes.tobytes ("Info"), Bytes.tobytes ("id"))); System.out.println ("Age:"+ Age +", Name:"+ name +", ID:"+ ID);    Htable.close (); } Public Static void Main(string[] args)throwsIOException {//Single InsertInsertsingle ();//BULK INSERTInsertbatch ();//Get data based on row keyGetData (); }}

For each of the three operations, first we execute create ' qyk_info ' in the hbase command line, ' info ' to create the table and column family, and then execute the program, you can see the console as follows:

Then we execute scan ' qyk_info ' to see, as follows:

We then use a single insert, Rowkey or 0, change the ID to 11,age to 19, and perform a single insert.
Then, execute get ' qyk_info ' on the command line, ' 0 ' to see:

In fact, this is the update operation, the value in the cell will have a timestamp, each time the current value of this column is displayed.
OK, the basic use of hbase is here, relatively superficial, and hope to bring help to everyone!

The path to learning the Hadoop Ecosystem (v) simple use of hbase

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.