Hive and HBase integration process

Source: Internet
Author: User
Tags key string

Achieve goals

    1. Hive can query the data in HBase in real time.
    2. Table Insert data in hive is updated synchronously to the corresponding table in HBase.
    3. You can map columns from different tables in hbase to a view in hive by using the left or inner join method.

Hive Map HBase

1, Start Hive HBase

In the case where the HIVE hbase service is started, $HIVE _home/bin/hive--auxpath $HIVE _home/lib/hive-hbase-handler-1.1.0-cdh5.7.1.jar, $HIVE _home /lib/hbase-common-1.2.0-cdh5.7.1.jar, $HIVE _home/lib/zookeeper-3.4.5-cdh5.7.1.jar, $HIVE _home/lib/ Guava-14.0.1.jar--hiveconfhbase.master=dwrj5123:60000 (This process may not be required).

2, Enquiry HBase the structure of the table

(1) Query jinan:si3u_ac06_temp

Describe 'jinan:si3u_ac06_temp'

Table Jinan:si3u_ac06_temp is ENABLED

jinan:si3u_ac06_temp                                                                                                                                            

COLUMN Families DESCRIPTION

{NAME = ' ac06_temp ', bloomfilter = ' ROW ', VERSIONS = ' 1 ', in_memory = ' false ', keep_deleted_cells = ' F Alse ', data_block_encoding = ' NONE ', TTL = '

FOREVER ', COMPRESSION = ' NONE ', min_versions = ' 0 ', Blockcache = ' true ', BLOCKSIZE = ' 65536 ', replication_ SCOPE = ' 0 '}

1 row (s) in 0.1960 seconds

Where jinan:si3u_ac06_temp is the table name, the column family is ac06_temp, by querying the data in the table to know that there is not limited to the following:

AC06_TEMP:AAC001,

ac06_temp:aae140,

ac06_temp:aae149,

ac06_temp:baa044,

ac06_temp:baa035,

ac06_temp:baa036,

ac06_temp:aae034

(2) Query jinan:si3u_ac01

HBase (main):003:0> describe ' jinan:si3u_ac01 '

Table JINAN:SI3U_AC01 is enabled                                                                              

jinan:si3u_ac01                                                                                                 

COLUMN Families DESCRIPTION

{NAME = ' AC01 ', bloomfilter = ' ROW ', VERSIONS = ' 1 ', in_memory = ' false ', Keep_deleted_cells = ' false '

MPRESSION = ' NONE ', min_versions = ' 0 ', Blockcache = ' true ', BLOCKSIZE = ' 65536 ', Replication_scope =

1 row (s) in 0.1850 seconds

Table JINAN:SI3U_AC01 is not limited to the following by querying the data in the table:

AC01:AAC001,

AC01:AAC003,

ac01:aaa109

3, Create Hive table to HBase Mapping

(1) Create a mapping of hive table JINAN_SI3U_AC01 to HBase table "JINAN:SI3U_AC01:

Explanation: jinan_si3u_ac01 is the table name in hive and Jinan:si3u_ac01 is the hbase table name that needs to be mapped .

": key,ac01:aac001,ac01:aac003,ac01:aaa109": for columns that need to be mapped, AC01 is a column family, and multiple columns are separated by commas.

(2) Create a mapping of hive table Jinan_si3u_ac06_temp to HBase table jinan:si3u_ac06_temp:

CREATE EXTERNAL TABLE jinan_si3u_ac06_temp (key string,aac001 string,aae140 string,aae149 string,baa044 string,baa035 Decimal (19,4), BAA036 decimal (19,4), AAE034 TIMESTAMP)

STORED by ' Org.apache.hadoop.hive.hbase.HBaseStorageHandler '

With Serdeproperties ("hbase.columns.mapping" = ": key,ac06_temp:aac001,ac06_temp:aae140,ac06_temp:aae149,ac06_temp : baa044,ac06_temp:baa035,ac06_temp:baa036,ac06_temp:aae034 ") tblproperties (" hbase.table.name "=" jinan:si3u_ac06_ TEMP ");

Hive CREATE View

In hive, you can combine data from multiple tables into a single view for easy querying and use by creating views. Here is an example of two tables of jinan_si3u_ac01 and jinan_si3u_ac06_temp mapped above.

Create View Fact_view (aac001,aac003,aaa109,aae140,aae149,baa044,baa035,baa036, AAE034) as SELECT a.aac001, A.aac003,a . AAA109, b.aae140,b.aae149, b.baa044,b.baa035, b.baa036,b.aae034 from Jinan_si3u_ac01 a right JOIN jinan_si3u_ac06_temp b On a.aac001 =b.aac001;

Table name

Column Name

View Name

Jinan_si3u_ac01

AAC001

AAC003

AAA109

Fact_view

Jinan_si3u_ac06_temp

AAE140

AAE149

BAA044

BAA035

BAA036

AAE034

Through select * from Fact_view; You can query the valid data.

KYLIN use of the hive view

The Kylin supports the Hive View building cube, which is the same process as using hive tables. After the build cube is complete, execute the query,

SELECT SUM (BAA035) from Fact_view to inner join Date_view on Fact_view.aae034=date_view.start_date where (Date_view.sta Rt_date> ' 2014-05-01 ' and date_view.start_date< ' 2015-01-01 ');

Summarize

Hive and HBase implement the ability to query data in HBase in real time through a single mapping, or you can insert data from a hive table into HBase. By building a view, you can consolidate data from multiple hive tables into a single view, making it easy to use data, and use the data in HBase without taking up data storage. Disadvantages, the above process no data cleaning process, there may be data conflict problems.

Reference

Https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

Hive and HBase integration process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.