Achieve goals
- Hive can query the data in HBase in real time.
- Table Insert data in hive is updated synchronously to the corresponding table in HBase.
- You can map columns from different tables in hbase to a view in hive by using the left or inner join method.
Hive Map HBase
1, Start Hive HBase
In the case where the HIVE hbase service is started, $HIVE _home/bin/hive--auxpath $HIVE _home/lib/hive-hbase-handler-1.1.0-cdh5.7.1.jar, $HIVE _home /lib/hbase-common-1.2.0-cdh5.7.1.jar, $HIVE _home/lib/zookeeper-3.4.5-cdh5.7.1.jar, $HIVE _home/lib/ Guava-14.0.1.jar--hiveconfhbase.master=dwrj5123:60000 (This process may not be required).
2, Enquiry HBase the structure of the table
(1) Query jinan:si3u_ac06_temp
Describe 'jinan:si3u_ac06_temp'
Table Jinan:si3u_ac06_temp is ENABLED
jinan:si3u_ac06_temp
COLUMN Families DESCRIPTION
{NAME = ' ac06_temp ', bloomfilter = ' ROW ', VERSIONS = ' 1 ', in_memory = ' false ', keep_deleted_cells = ' F Alse ', data_block_encoding = ' NONE ', TTL = '
FOREVER ', COMPRESSION = ' NONE ', min_versions = ' 0 ', Blockcache = ' true ', BLOCKSIZE = ' 65536 ', replication_ SCOPE = ' 0 '}
1 row (s) in 0.1960 seconds
Where jinan:si3u_ac06_temp is the table name, the column family is ac06_temp, by querying the data in the table to know that there is not limited to the following:
AC06_TEMP:AAC001,
ac06_temp:aae140,
ac06_temp:aae149,
ac06_temp:baa044,
ac06_temp:baa035,
ac06_temp:baa036,
ac06_temp:aae034
(2) Query jinan:si3u_ac01
HBase (main):003:0> describe ' jinan:si3u_ac01 '
Table JINAN:SI3U_AC01 is enabled
jinan:si3u_ac01
COLUMN Families DESCRIPTION
{NAME = ' AC01 ', bloomfilter = ' ROW ', VERSIONS = ' 1 ', in_memory = ' false ', Keep_deleted_cells = ' false '
MPRESSION = ' NONE ', min_versions = ' 0 ', Blockcache = ' true ', BLOCKSIZE = ' 65536 ', Replication_scope =
1 row (s) in 0.1850 seconds
Table JINAN:SI3U_AC01 is not limited to the following by querying the data in the table:
AC01:AAC001,
AC01:AAC003,
ac01:aaa109
3, Create Hive table to HBase Mapping
(1) Create a mapping of hive table JINAN_SI3U_AC01 to HBase table "JINAN:SI3U_AC01:
Explanation: jinan_si3u_ac01 is the table name in hive and Jinan:si3u_ac01 is the hbase table name that needs to be mapped .
": key,ac01:aac001,ac01:aac003,ac01:aaa109": for columns that need to be mapped, AC01 is a column family, and multiple columns are separated by commas.
(2) Create a mapping of hive table Jinan_si3u_ac06_temp to HBase table jinan:si3u_ac06_temp:
CREATE EXTERNAL TABLE jinan_si3u_ac06_temp (key string,aac001 string,aae140 string,aae149 string,baa044 string,baa035 Decimal (19,4), BAA036 decimal (19,4), AAE034 TIMESTAMP)
STORED by ' Org.apache.hadoop.hive.hbase.HBaseStorageHandler '
With Serdeproperties ("hbase.columns.mapping" = ": key,ac06_temp:aac001,ac06_temp:aae140,ac06_temp:aae149,ac06_temp : baa044,ac06_temp:baa035,ac06_temp:baa036,ac06_temp:aae034 ") tblproperties (" hbase.table.name "=" jinan:si3u_ac06_ TEMP ");
Hive CREATE View
In hive, you can combine data from multiple tables into a single view for easy querying and use by creating views. Here is an example of two tables of jinan_si3u_ac01 and jinan_si3u_ac06_temp mapped above.
Create View Fact_view (aac001,aac003,aaa109,aae140,aae149,baa044,baa035,baa036, AAE034) as SELECT a.aac001, A.aac003,a . AAA109, b.aae140,b.aae149, b.baa044,b.baa035, b.baa036,b.aae034 from Jinan_si3u_ac01 a right JOIN jinan_si3u_ac06_temp b On a.aac001 =b.aac001;
Table name |
Column Name |
|
View Name |
Jinan_si3u_ac01 |
AAC001 AAC003 AAA109 |
|
Fact_view |
Jinan_si3u_ac06_temp |
AAE140 AAE149 BAA044 BAA035 BAA036 AAE034 |
Through select * from Fact_view; You can query the valid data.
KYLIN use of the hive view
The Kylin supports the Hive View building cube, which is the same process as using hive tables. After the build cube is complete, execute the query,
SELECT SUM (BAA035) from Fact_view to inner join Date_view on Fact_view.aae034=date_view.start_date where (Date_view.sta Rt_date> ' 2014-05-01 ' and date_view.start_date< ' 2015-01-01 ');
Summarize
Hive and HBase implement the ability to query data in HBase in real time through a single mapping, or you can insert data from a hive table into HBase. By building a view, you can consolidate data from multiple hive tables into a single view, making it easy to use data, and use the data in HBase without taking up data storage. Disadvantages, the above process no data cleaning process, there may be data conflict problems.
Reference
Https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
Hive and HBase integration process