The integration of hive and hbase is to use their own external API interface to communicate with each other, communication is mainly dependent on hive-hbase-handler.jar tools; hive-hbase-handler.jar in the hive lib package instead of in hbase Lib, hive0.6 version later;
Create an hbase table when creating a hive table. When deleting a hive table, the corresponding hbase table is also deleted.See the official documentation: https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration prerequisites: configure the corresponding CDH version of hive and hbase, the hive version used in this case is the hive-0.12.0-cdh5.0.0, hbase uses a version that adds $ hive_home/CONF/hbase-0.96.1.1-cdh5.0.0 configuration properties to the hive-site.xml
<property> <name>hive.aux.jars.path</name> <value>file:///home/spark/app/hive-0.12.0-cdh5.0.0/lib/hive-hbase-handler-0.12.0-cdh5.0.0.jar,file:///home/spark/app/hive-0.12.0-cdh5.0.0/lib/zookeeper-3.4.5-cdh5.0.0.jar,file:///home/spark/app/hive-0.12.0-cdh5.0.0/lib/hbase-common-0.96.1.1-cdh5.0.0.jar,file:///home/spark/app/hive-0.12.0-cdh5.0.0/lib/hbase-client-0.96.1.1-cdh5.0.0.jar,file:///home/spark/app/hive-0.12.0-cdh5.0.0/lib/hbase-server-0.96.1.1-cdh5.0.0.jar,file:///home/spark/app/hive-0.12.0-cdh5.0.0/lib/hbase-protocol-0.96.1.1-cdh5.0.0.jar,file:///home/spark/app/hive-0.12.0-cdh5.0.0/lib/htrace-core-2.01.jar</value></property>
Copy all jar packages except hive-hbase-handler from $ hbase_home/lib to $ hive_home/lib.
Start Hive: RecommendedHive-hiveconf hive. Root. Logger = debug, consoleStart, you can see more detailed log information
Case 1: simple operation form cf
Create a hive-hbase table:
CREATE TABLE hive_hbase_table_kv(key string, value string) STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler‘WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")TBLPROPERTIES ("hbase.table.name" = "hbase_hive_table_kv");
Key and: key correspond to value and Val. hbase_hive_table_kv indicates hbase table name hive_hbase_table_kv indicates hive table name.
Create a hive table and import data
CREATE TABLE kv (key STRING, value STRING);LOAD DATA LOCAL INPATH ‘/home/spark/app/spark-1.0.0-bin-2.3.0-cdh5.0.0/examples/src/main/resources/kv1.txt‘ OVERWRITE INTO TABLE kv;INSERT OVERWRITE TABLE hive_hbase_table_kv SELECT key, value FROM kv;
View the hive and hbase tables and query data on both sides.
Case 2: simple table multi-cf
CREATE TABLE hbase_table_2(key string, value1 string, value2 string, value3 string) STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler‘WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,a:b,a:c,d:e");
The B/C field belongs to column A, and the E field belongs to column D.
By default, the table name of hbase is the same as that of hive.
Import hive table data: insert overwrite table hbase_table_2 select empno, ename, job, deptno from EMP;
Case 3: Partitioned Tables
CREATE TABLE hbase_table_3(key string, ename string, job string, sal double) partitioned by(pt string)STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler‘WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,a:b,a:c,d:e")TBLPROPERTIES ("hbase.table.name" = "hbase_table_3");
Import hive data:
Insert overwrite table hbase_table_3 partition (Pt = '2017-08-01 ') Select empno, ename, job, Sal from EMP;
Note:
A partitioned table integrated with hbase has a problem when using hive query: Select * from Table query does not display data. Select column from table can check data.
Why does select * from XXX not display data?
Select * from XXX directly reads HDFS files in normal tables. When using hive-hbase-handler to import data, the data is stored in hbase HDFS;
In this example, you can directly query from hbase. The query is successful, but hive does not.