Integration of Hadoop Hive and Hbase

Source: Internet
Author: User
Document directory
  • 1. Hadoop and Hbase have been installed successfully.
  • 2. Copy the hbase-0.90.4.jar and zookeeper-3.3.2.jar to hive/lib.
  • 3. Modify the hive-site.xml file in hive/conf and add the following content at the bottom:
  • 4. Copy the hbase-0.90.4.jar to hadoop/lib on all hadoop nodes (including the master.
  • 1. Start a Single Node
  • 2. Start the cluster:
  • 1. Create a database identified by hbase:
  • 2. Use SQL to import data
Integration of Hadoop Hive and Hbase
I. Introduction

Hive is a Hadoop-based data warehouse tool that maps structured data files into a database table and provides a complete SQL query function, you can convert SQL statements to MapReduce tasks for running. The advantage is that the learning cost is low. You can use SQL-like statements to quickly implement simple MapReduce statistics without having to develop special MapReduce applications. This is suitable for the statistical analysis of data warehouses.

Hive and HBase integration function implementation is to use the two itself external API interface to communicate with each other, mutual communication is mainly rely on hive_hbase-handler.jar tool class, roughly meaning:

Ii. installation steps: 1. Hadoop and Hbase have been successfully installed.

Hadoop cluster configuration-http://blog.csdn.net/hguisu/article/details/723739

Hbase installation configuration: http://blog.csdn.net/hguisu/article/details/7244413

2. Copy the hbase-0.90.4.jar and zookeeper-3.3.2.jar to hive/lib.

NOTE: If another version of the two files already exists under hive/lib (such as the zookeeper-3.3.2.jar), we recommend that you delete it and use the relevant version under hbase.

3. Modify the hive-site.xml file in hive/conf and add the following content at the bottom:
<!--  <property>    <name>hive.exec.scratchdir</name>     <value>/usr/local/hive/tmp</value>   </property>   -->    <property>     <name>hive.querylog.location</name>     <value>/usr/local/hive/logs</value>   </property>     <property>    <name>hive.aux.jars.path</name>     <value>file:///usr/local/hive/lib/hive-hbase-handler-0.8.0.jar,file:///usr/local/hive/lib/hbase-0.90.4.jar,file:///usr/local/hive/lib/zookeeper-3.3.2.jar</value>  </property>  

Note: If the hive-site.xml does not exist, create it on your own, or rename the hive-default.xml.template file and use it.

4. Copy the hbase-0.90.4.jar to hadoop/lib on all hadoop nodes (including the master. 5. Copy the hbase-site.xml file under hbase/conf to hadoop/conf on all hadoop nodes (including master.

Note: If you skip step 3 or 4, the following error may occur during hive running:

[html] view plaincopyorg.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately.   This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and   then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information. at org.apache.hadoop.  hbase.zookeeper.ZooKeeperWatcher. 
3. Start Hive1. start a Single Node

#bin/hive -hiveconf hbase.master=master:490001

2. Start the cluster:

#bin/hive -hiveconf hbase.zookeeper.quorum=node1,node2,node3

You can start as follows if hive. aux. jars. path is not configured in the hive-site.xml file.

bin/hive --auxpath /usr/local/hive/lib/hive-hbase-handler-0.8.0.jar, /usr/local/hive/lib/hbase-0.90.5.jar,
/usr/local/hive/lib/zookeeper-
3.3.2.jar -hiveconf hbase.zookeeper.quorum=node1,node2,node3

Iv. Test: 1. Create a database identified by hbase:

CREATE TABLE hbase_table_1(key int, value string)STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")TBLPROPERTIES ("hbase.table.name" = "xyz");  

Hbase. table. name defines the table name in hbase.

Hbase. columns. mapping defines the columnfamily in hbase.

2. Use SQL to import data

1) create a hive data table:

Create table pokes (foo INT, bar STRING );
2) Batch insert data:

Hive> load data local inpath'./examples/files/kv1.txt 'Overwrite INTO TABLE

3) use SQL to import hbase_table_1:

Hive> insert overwrite table hbase_table_1 SELECT * FROM pokes WHERE foo = 86;

3. View data

Hive> select * from hbase_table_1;

Now you can log on to Hbase to view the data.
# Bin/hbase shell
Hbase (main): 001: 0> describe 'xyz'
Hbase (main): 002: 0> scan 'xyz'
Hbase (main): 003: 0> put 'xyz', '2013', 'cf1: val', 'www .360buy.com'

In Hive, we can see the data inserted in Hbase.

4. Access an existing hbase through hive

Use create external table:

CREATE EXTERNAL TABLE hbase_table_2(key int, value string)      STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val")TBLPROPERTIES("hbase.table.name" = "some_existing_table");

Content reference: http://wiki.apache.org/hadoop/Hive/HBaseIntegration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.