Hive Connection HBase Operation data

Source: Internet
Author: User
Tags key string zookeeper
Hive Integrated HBase principle

Hive is a data Warehouse tool based on Hadoop that maps structured data files to a database table and provides complete SQL query functionality that translates SQL statements into MapReduce tasks. The advantage is that the learning cost is low, the simple mapreduce statistic can be realized quickly by SQL statement, and it is very suitable for the statistic analysis of data Warehouse without developing special MapReduce application.

The implementation of hive and HBase integration is to communicate with each other by using the API interface of the two, which mainly relies on the hive installation package Lib/hive-hbase-handler.jar Tool class, which is responsible for hbase and hive communication.


Hadoop,hbase,hive are already clustered to normal installation.

Hadoop,hbase,hive has been started normally.


command-line mode connection HBase

[Root@node1 bin]#./hbase Shell

List look at the table

HBase (Main): 006:0* list
TABLE                                                                                                                                                                                
Test                                                                                                                                                                                 
user                                                                                                                                                                                 
2 row (s) in 0.4750 seconds
Look at the structure of the table

HBase (main):007:0> describe ' user ' Table user is ENABLED                                                                                                                                              User                                                                                                                                                          
COLUMN Families DESCRIPTION {NAME => ' account ', DAT  A_block_encoding => ' NONE ', Bloomfilter => ' ROW ', Replication_scope => ' 0 ', versions => ' 1 ', COMPRESSION => ' NONE ', min_versions => ' 0 ', TTL => ' FOREVER ', KEE p_deleted_cells => ' FALSE ', BLOCKSIZE => ' 65536 ', In_memor                                                                                        
Y => ' false ', Blockcache => ' true '} {NAME => ' addRess ', data_block_encoding => ' NONE ', Bloomfilter => ' ROW ', Replication_scope => ' 0 ', versions => ' 1 ', COMPRES SION => ' NONE ', min_versions => ' 0 ', TTL => ' FOREVER ', KEE p_deleted_cells ' FALSE ', => BLOCKSIZE ' 65536 '                                                                                        
, In_memory => ' false ', Blockcache => ' true '} {NAME => ' info ', data_block_encoding => ' NONE ', Bloomfilter => ' ROW ', Replication_scope => ' 0 ', versions => ' 1 ', COMPRESSION => ' NONE ', min_versions => ' 0 ', TTL => ' FOREVER ', Keep_d eleted_cells '                                                                                           
FALSE ', BLOCKSIZE => ' 65536 ', in_memory => ' false ', Blockcache => ' true '} {NAME => ' userid ', data_block_encoding => ' NONE ', Bloomfilter =& Gt ' ROW ', Replication_scope => ' 0 ', versions => ' 1 ', COMPRESSION => ' NONE ', min_versions => ' 0 ', TTL => ' Forev ER ', KEEP _deleted_cElls => ' false ', BLOCKSIZE => ' 65536 ', in_memory => ' false ', Blockcache => ' true '}

 4 row (s) in 0.7020 seconds


Then scan the user table data for a look.

HBase (main):004:0> scan ' user ' ROW Column+cell                                          Lisi                                                                              
 Column=account:name, timestamp=1495708477345, Value=lisi Lisi Column=account:passport, timestamp=14957084                                          77353, value=96857123123231 Lisi                                                                        
 Column=account:password, timestamp=1495708477349, value=654321  Lisi Column=address:city, timestamp=1495708477381,                                          Value=\xe6\xb7\xb1\xe5\x9c\xb3 Lisi Column=adDress:province, timestamp=1495708477377, value=\xe5\xb9\xbf\xe4\xb8\x9c                                                                                    
 Lisi Column=info:age, timestamp=1495708477358, value=38 Lisi Column=info                                                                          
 : Sex, timestamp=1495708477363, Value=\xe5\xa5\xb3                                                                                  
 Lisi Column=userid:id, timestamp=1495708477330, value=002 Zhangsan Column=accoun                                                                          
 T:name, timestamp=1495708405658, Value=zhangsan                                  Zhangsan Column=account:passport, timestamp=1495708405699, value=968574321                                   
 Zhangsan Column=account:password, timestamp=1495                                      708405669, value=123456 Zhangsan                                                          
 Column=address:city, timestamp=1495708405773, Value=\xe6\xb7\xb1\xe5\x9c\xb3 Zhangsan Column=address:province, timestamp=149570                                      8405764, value=\xe5\xb9\xbf\xe4\xb8\x9c Zhangsan                                                                                    
 Column=info:age, timestamp=1495708405712, value=26 Zhangsan Column=info:sex, timestamp=1495708405755, V                                      Alue=\xe7\x94\xb7 Zhangsan column=Userid:id, timestamp=1495708405444, value=001
 2 row (s) in 0.2020 seconds

Running hive command line mode in Hive/bin

[Root@master bin]#./hive

Execute the Hbase_user Table of association hbase associated statements

You will find an error: returncode 1 from Org.apache.hadoop.hive.ql.exec.DDLTask. Metaexception (Message:org.apache.hadoop.hbase.client.RetriesExhaustedException:Can ' t get the locations

hive> CREATE EXTERNAL TABLE hbase_user (key string, Idcard string,passport string,country string,name \ String,password S Tring, > Province string,city string,age string,sex string, id string) > STORED by ' Org.apache.hadoop.hiv E.hbase.hbasestoragehandler ' > with serdeproperties ("hbase.columns.mapping" = ": Key,account:idcard,account:pass Port,account:country,account:name,account:password, > Address:province,address:city,info:age,info:sex,userid:
ID ") > tblproperties (" hbase.table.name "=" user "); Failed:execution Error, return code 1 from Org.apache.hadoop.hive.ql.exec.DDLTask. Metaexception (Message:org.apache.hadoop.hbase.client.RetriesExhaustedException:Can ' t get the locations at Org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations ( rpcretryingcallerwithreadreplicas.java:312) at Org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call ( scannercallablewithreplicas.java:153) at Org.apache.hadoop.hbase.clieNt. Scannercallablewithreplicas.call (scannercallablewithreplicas.java:61) at Org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries (rpcretryingcaller.java:200) at Org.apache.hadoop.hbase.client.ClientScanner.call (clientscanner.java:320) at Org.apache.hadoop.hbase.client.ClientScanner.nextScanner (clientscanner.java:295) at Org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction (clientscanner.java:160) at Org.apache.hadoop.hbase.client.clientscanner.<init> (clientscanner.java:155) at Org.apache.hadoop.hbase.client.HTable.getScanner (htable.java:811) at Org.apache.hadoop.hbase.MetaTableAccessor.fullScan (metatableaccessor.java:602) at Org.apache.hadoop.hbase.MetaTableAccessor.tableExists (metatableaccessor.java:366) at Org.apache.hadoop.hbase.client.HBaseAdmin.tableExists (hbaseadmin.java:303) at Org.apache.hadoop.hbase.client.HBaseAdmin.tableExists (hbaseadmin.java:313) at Org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable (Hbasestoragehandler.java:205) at Org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable (
 hivemetastoreclient.java:742)
Can not get location and the following log should be not connected to the HBase, with JPS to see the operation of the HBase

[Root@master conf]# JPS
3945 hmaster 18681 runjar 2699 namenode 3330 nodemanager
2951 Secondarynamenode
3226 ResourceManager
3874 hquorumpeer
18901
Found everything was okay.

Next check the Hive Log discovery: All zookeeper Connection failed

Opening socket connection to server localhost/127.0.0.1:2181

2017-05-25t03:06:12,259 INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-sendthread (localhost:2181)] zookeeper. Clientcnxn:opening socket connection to server localhost/127.0.0.1:2181. Would not attempt to authenticate using SASL (unknown error) 2017-05-25t03:06:12,260 WARN [9b1835d1-6488-4521-99d5-88d3e78 6be46 Main-sendthread (localhost:2181)] zookeeper. Clientcnxn:session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect Nectexception:connection refused at Sun.nio.ch.SocketChannelImpl.checkConnect (Native method) at Sun.nio.ch.SocketChannelImpl.finishConnect (socketchannelimpl.java:739) at Org.apache.zookeeper.ClientCnxnSocketNIO.doTransport (clientcnxnsocketnio.java:361) at Org.apache.zookeeper.clientcnxn$sendthread.run (clientcnxn.java:1081) 2017-05-25t03:06:13,362 INFO [ 9b1835d1-6488-4521-99d5-88d3e786be46 Main-sendthread (localhost:2181)] zookeeper. Clientcnxn:opening socket connection to server localhost/127.0.0.1:2181. Would noT attempt to authenticate using SASL (unknown error) 2017-05-25t03:06:13,363 WARN [9b1835d1-6488-4521-99d5-88d3e786be46 m Ain-sendthread (localhost:2181)] zookeeper. Clientcnxn:session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect Nectexception:connection refused at Sun.nio.ch.SocketChannelImpl.checkConnect (Native method) at Sun.nio.ch.SocketChannelImpl.finishConnect (socketchannelimpl.java:739) at Org.apache.zookeeper.ClientCnxnSocketNIO.doTransport (clientcnxnsocketnio.java:361) at Org.apache.zookeeper.clientcnxn$sendthread.run (clientcnxn.java:1081) 2017-05-25t03:06:13,465 INFO [ 9b1835d1-6488-4521-99d5-88d3e786be46 Main-sendthread (localhost:2181)] zookeeper. Clientcnxn:opening socket connection to server localhost/127.0.0.1:2181. Would not attempt to authenticate using SASL (unknown error) 2017-05-25t03:06:13,466 WARN [9b1835d1-6488-4521-99d5-88d3e78 6be46 Main-sendthread (localhost:2181)] zookeeper. Clientcnxn:session 0x0 forServer NULL, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException:Connection re Fused at Sun.nio.ch.SocketChannelImpl.checkConnect (Native method) at Sun.nio.ch.SocketChannelImpl.finishConnect ( socketchannelimpl.java:739) at Org.apache.zookeeper.ClientCnxnSocketNIO.doTransport (clientcnxnsocketnio.java:361 ) at Org.apache.zookeeper.clientcnxn$sendthread.run (clientcnxn.java:1081) 2017-05-25t03:06:14,568 INFO [ 9b1835d1-6488-4521-99d5-88d3e786be46 Main-sendthread (localhost:2181)] zookeeper. Clientcnxn:opening socket connection to server localhost/127.0.0.1:2181. Would not attempt to authenticate using SASL (unknown error) 2017-05-25t03:06:14,569 WARN [9b1835d1-6488-4521-99d5-88d3e78 6be46 Main-sendthread (localhost:2181)] zookeeper.
 Clientcnxn:session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
To see the Hive-site.xml configuration file, configure the

<property>
    <name>hive.zookeeper.quorum</name>
    <value>master,node1,node2</ value>
    <description>
      List of zookeeper servers to talk. This is needed for:
      1. Read/write locks-when Hive.lock.manager is set
      to Org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager,
      2. When HiveServer2 supports service discovery via zookeeper.
      3. For delegation token storage if zookeeper the store is used, if
      Hive.cluster.delegation.token.store.zookeeper.connectString is not set
      4. Llap Daemon Registry Service
    </description>
  </property>
<property>
    <name>hive.zookeeper.client.port</name>
    <value>2181</value>
    <description>
      The port of zookeeper servers to talk.
      If the list of zookeeper servers specified in Hive.zookeeper.quorum
      does not contain port numbers, this value is used.
    </description>
  </property>

The default port in the HBase cluster is 2222, so for port unification, change the hbase-site.xml to 2181, and remember to restart the service.

or copy hbase-site.xml directly to Hive's Conf directory, hive will read HBase zookeeper Zookeeper.quorum and Zookeeper.port

Both of these methods can solve the problem


Then execute the CREATE TABLE statement again in Hive

hive> CREATE EXTERNAL TABLE hbase_user (key string, Idcard string,passport string,country string,name \ String,password S Tring,
    > Province string,city string,age string,sex string, id string)   
    > STORED by ' org.apache.hadoop.hive . HBase. Hbasestoragehandler '   
    > with serdeproperties ("hbase.columns.mapping" = ": Key,account:idcard,account:passport , Account:country,account:name,account:password,
    > Address:province,address:city,info:age,info:sex,userid : id ")   
    > Tblproperties (" hbase.table.name "=" user ");
OK time
taken:20.323 seconds
hive> show tables;
OK
apachelog
hbase_user time
taken:2.75 seconds, Fetched:2 row (s)

Executed successfully,

Then use the HIVEQL to query the data

Hive> select * from Hbase_user; OK Lisi null 96857123123231 null Lisi 654321 Guangdong Shenzhen 38 female 002 Zhangsan null 968574321 null Zhangsan 123456 Guangdong Shenzhen 26 men 001 Ti
Me taken:5.798 seconds, Fetched:2 row (s) hive> describe Hbase_user;              	                    
OK Key String Idcard string              	                    
Passport string Country string              	                    
Name string Password string              	                    
Province String City string              	                    
Age String Sex string ID string time taken:3.785 seconds, fetched: One row (s) hive> select Key, Idcard,Password,country,name, passport,province,city,age,sex,id from Hbase_user; OK Lisi null 654321 null Lisi 96857123123231 Guangdong Shenzhen 38 002 zhangsan NULL 123456 Beijing Zhangsan 968574321 Guangdong Shenzhen 26 men 001 T IME taken:2.341 seconds, Fetched:2 row (s)

.. NULL because the column of HBase does not set the Idcard field value, and the value of country is null

Give HBase table user set country look, and Idcard

./hbase Shell

HBase (main):003:0> put ' user ', ' zhangsan ', ' account:idcard ', ' 420923156366998855 ';
HBase (Main): 004:0* put ' user ', ' Lisi ', ' account:idcard ', ' 520369856366998855 ';
HBase (Main): 005:0* put ' user ', ' Lisi ', ' account:country ', ' she ';

Hive> Select Key, Idcard,password,country,name, passport,province,city,age,sex,id from Hbase_user; OK Lisi 520369856366998855 654321 Lisi 96857123123231 Guangdong Shenzhen 38 Women 002 Zhangsan 420923156366998855 123456 An 968574321 Guangdong Shenzhen 26 men 001 Time taken:2.388 seconds, Fetched:2 row (s) hive> select * from Hbase_user where name= ' en
Angsan '; 
OK Zhangsan 420923156366998855 968574321 zhangsan 123456 Guangdong Shenzhen 26 men 001 Time taken:2.651 seconds, fetched:1 row (s)
Hive> Select COUNT (key) from Hbase_user; WARNING:HIVE-ON-MR is deprecated in Hive 2 and may are available in the future versions.
Consider using a different execution engine (i.e. Tez, Spark) or using Hive 1.X releases. Query ID = root_20170525040249_f808c765-79f6-43c0-aa94-ebfed7751091 Total jobs = 1 Launching Job 1 out of 1 number of redu Ce tasks determined at compile time:1 the average load for a reducer (in bytes): Set Hive.exec.reduc Ers.bytes.per.reducer=<number> in order to LiMIT the maximum number of Reducers:set hive.exec.reducers.max=<number> in order to set a constant number of reduc Ers:set mapreduce.job.reduces=<number> Starting job = JOB_1495621107567_0001, tracking URL = http://master:8088/p
Roxy/application_1495621107567_0001/kill Command =/usr/tools/hadoop/bin/hadoop Job-kill job_1495621107567_0001 Hadoop Job information for Stage-1: number of mappers:1; Number of reducers:1 failed:execution Error, return code-101 from Org.apache.hadoop.hive.ql.exec.mr.MapRedTask. PermGen space



When executing count, run Mapreducer,permgen space .... and drunk.

The WINDOW7 machine runs a virtual machine, 3 Linux hadoop,hbase,hive clusters, 8g memory memory usage goes horizontal,.....

In fact, sometimes it is possible to count out the data, and it is estimated that Windows has many processes





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.