Document directory
- 1. Hadoop and Hbase have been installed successfully.
- 2. Copy the hbase-0.90.4.jar and zookeeper-3.3.2.jar to hive/lib.
- 3. Modify the hive-site.xml file in hive/conf and add the following content at the bottom:
- 4. Copy the hbase-0.90.4.jar to hadoop/lib on all hadoop nodes (including the master.
- 1. Start a Single Node
- 2. Start the cluster:
- 1. Create a database identified by hbase:
- 2. Use SQL to import data
Integration of Hadoop Hive and Hbase
I. Introduction
Hive is a Hadoop-based data warehouse tool that maps structured data files into a database table and provides a complete SQL query function, you can convert SQL statements to MapReduce tasks for running. The advantage is that the learning cost is low. You can use SQL-like statements to quickly implement simple MapReduce statistics without having to develop special MapReduce applications. This is suitable for the statistical analysis of data warehouses.
Hive and HBase integration function implementation is to use the two itself external API interface to communicate with each other, mutual communication is mainly rely on hive_hbase-handler.jar tool class, roughly meaning:
Ii. installation steps: 1. Hadoop and Hbase have been successfully installed.
Hadoop cluster configuration-http://blog.csdn.net/hguisu/article/details/723739
Hbase installation configuration: http://blog.csdn.net/hguisu/article/details/7244413
2. Copy the hbase-0.90.4.jar and zookeeper-3.3.2.jar to hive/lib.
NOTE: If another version of the two files already exists under hive/lib (such as the zookeeper-3.3.2.jar), we recommend that you delete it and use the relevant version under hbase.
3. Modify the hive-site.xml file in hive/conf and add the following content at the bottom:
<!-- <property> <name>hive.exec.scratchdir</name> <value>/usr/local/hive/tmp</value> </property> --> <property> <name>hive.querylog.location</name> <value>/usr/local/hive/logs</value> </property> <property> <name>hive.aux.jars.path</name> <value>file:///usr/local/hive/lib/hive-hbase-handler-0.8.0.jar,file:///usr/local/hive/lib/hbase-0.90.4.jar,file:///usr/local/hive/lib/zookeeper-3.3.2.jar</value> </property>
Note: If the hive-site.xml does not exist, create it on your own, or rename the hive-default.xml.template file and use it.
4. Copy the hbase-0.90.4.jar to hadoop/lib on all hadoop nodes (including the master. 5. Copy the hbase-site.xml file under hbase/conf to hadoop/conf on all hadoop nodes (including master.
Note: If you skip step 3 or 4, the following error may occur during hive running:
[html] view plaincopyorg.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information. at org.apache.hadoop. hbase.zookeeper.ZooKeeperWatcher.
3. Start Hive1. start a Single Node
#bin/hive -hiveconf hbase.master=master:490001
2. Start the cluster:
#bin/hive -hiveconf hbase.zookeeper.quorum=node1,node2,node3
You can start as follows if hive. aux. jars. path is not configured in the hive-site.xml file.
bin/hive --auxpath /usr/local/hive/lib/hive-hbase-handler-
0.8
.
0
.jar, /usr/local/hive/lib/hbase-
0.90
.
5
.jar,
/usr/local/hive/lib/zookeeper-
3.3
.
2
.jar -hiveconf hbase.zookeeper.quorum=node1,node2,node3
Iv. Test: 1. Create a database identified by hbase:
CREATE TABLE hbase_table_1(key int, value string)STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")TBLPROPERTIES ("hbase.table.name" = "xyz");
Hbase. table. name defines the table name in hbase.
Hbase. columns. mapping defines the columnfamily in hbase.
2. Use SQL to import data
1) create a hive data table:
Create table pokes (foo INT, bar STRING );
2) Batch insert data:
Hive> load data local inpath'./examples/files/kv1.txt 'Overwrite INTO TABLE
3) use SQL to import hbase_table_1:
Hive> insert overwrite table hbase_table_1 SELECT * FROM pokes WHERE foo = 86;
3. View data
Hive> select * from hbase_table_1;
Now you can log on to Hbase to view the data.
# Bin/hbase shell
Hbase (main): 001: 0> describe 'xyz'
Hbase (main): 002: 0> scan 'xyz'
Hbase (main): 003: 0> put 'xyz', '2013', 'cf1: val', 'www .360buy.com'
In Hive, we can see the data inserted in Hbase.
4. Access an existing hbase through hive
Use create external table:
CREATE EXTERNAL TABLE hbase_table_2(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val")TBLPROPERTIES("hbase.table.name" = "some_existing_table");
Content reference: http://wiki.apache.org/hadoop/Hive/HBaseIntegration