Hive Integrated HBase principle
Hive is a data Warehouse tool based on Hadoop that maps structured data files to a database table and provides complete SQL query functionality that translates SQL statements into MapReduce tasks. The advantage is that the learning cost is low, the simple mapreduce statistic can be realized quickly by SQL statement, and it is very suitable for the statistic analysis of data Warehouse without developing special MapReduce application.
The implementation of hive and HBase integration is to communicate with each other by using the API interface of the two, which mainly relies on the hive installation package Lib/hive-hbase-handler.jar Tool class, which is responsible for hbase and hive communication.
Hadoop,hbase,hive are already clustered to normal installation.
Hadoop,hbase,hive has been started normally.
command-line mode connection HBase
[Root@node1 bin]#./hbase Shell
List look at the table
HBase (Main): 006:0* list
TABLE
Test
user
2 row (s) in 0.4750 seconds
Look at the structure of the table
HBase (main):007:0> describe ' user ' Table user is ENABLED User
COLUMN Families DESCRIPTION {NAME => ' account ', DAT A_block_encoding => ' NONE ', Bloomfilter => ' ROW ', Replication_scope => ' 0 ', versions => ' 1 ', COMPRESSION => ' NONE ', min_versions => ' 0 ', TTL => ' FOREVER ', KEE p_deleted_cells => ' FALSE ', BLOCKSIZE => ' 65536 ', In_memor
Y => ' false ', Blockcache => ' true '} {NAME => ' addRess ', data_block_encoding => ' NONE ', Bloomfilter => ' ROW ', Replication_scope => ' 0 ', versions => ' 1 ', COMPRES SION => ' NONE ', min_versions => ' 0 ', TTL => ' FOREVER ', KEE p_deleted_cells ' FALSE ', => BLOCKSIZE ' 65536 '
, In_memory => ' false ', Blockcache => ' true '} {NAME => ' info ', data_block_encoding => ' NONE ', Bloomfilter => ' ROW ', Replication_scope => ' 0 ', versions => ' 1 ', COMPRESSION => ' NONE ', min_versions => ' 0 ', TTL => ' FOREVER ', Keep_d eleted_cells '
FALSE ', BLOCKSIZE => ' 65536 ', in_memory => ' false ', Blockcache => ' true '} {NAME => ' userid ', data_block_encoding => ' NONE ', Bloomfilter =& Gt ' ROW ', Replication_scope => ' 0 ', versions => ' 1 ', COMPRESSION => ' NONE ', min_versions => ' 0 ', TTL => ' Forev ER ', KEEP _deleted_cElls => ' false ', BLOCKSIZE => ' 65536 ', in_memory => ' false ', Blockcache => ' true '}
4 row (s) in 0.7020 seconds
Then scan the user table data for a look.
HBase (main):004:0> scan ' user ' ROW Column+cell Lisi
Column=account:name, timestamp=1495708477345, Value=lisi Lisi Column=account:passport, timestamp=14957084 77353, value=96857123123231 Lisi
Column=account:password, timestamp=1495708477349, value=654321 Lisi Column=address:city, timestamp=1495708477381, Value=\xe6\xb7\xb1\xe5\x9c\xb3 Lisi Column=adDress:province, timestamp=1495708477377, value=\xe5\xb9\xbf\xe4\xb8\x9c
Lisi Column=info:age, timestamp=1495708477358, value=38 Lisi Column=info
: Sex, timestamp=1495708477363, Value=\xe5\xa5\xb3
Lisi Column=userid:id, timestamp=1495708477330, value=002 Zhangsan Column=accoun
T:name, timestamp=1495708405658, Value=zhangsan Zhangsan Column=account:passport, timestamp=1495708405699, value=968574321
Zhangsan Column=account:password, timestamp=1495 708405669, value=123456 Zhangsan
Column=address:city, timestamp=1495708405773, Value=\xe6\xb7\xb1\xe5\x9c\xb3 Zhangsan Column=address:province, timestamp=149570 8405764, value=\xe5\xb9\xbf\xe4\xb8\x9c Zhangsan
Column=info:age, timestamp=1495708405712, value=26 Zhangsan Column=info:sex, timestamp=1495708405755, V Alue=\xe7\x94\xb7 Zhangsan column=Userid:id, timestamp=1495708405444, value=001
2 row (s) in 0.2020 seconds
Running hive command line mode in Hive/bin
[Root@master bin]#./hive
Execute the Hbase_user Table of association hbase associated statements
You will find an error: returncode 1 from Org.apache.hadoop.hive.ql.exec.DDLTask. Metaexception (Message:org.apache.hadoop.hbase.client.RetriesExhaustedException:Can ' t get the locations
hive> CREATE EXTERNAL TABLE hbase_user (key string, Idcard string,passport string,country string,name \ String,password S Tring, > Province string,city string,age string,sex string, id string) > STORED by ' Org.apache.hadoop.hiv E.hbase.hbasestoragehandler ' > with serdeproperties ("hbase.columns.mapping" = ": Key,account:idcard,account:pass Port,account:country,account:name,account:password, > Address:province,address:city,info:age,info:sex,userid:
ID ") > tblproperties (" hbase.table.name "=" user "); Failed:execution Error, return code 1 from Org.apache.hadoop.hive.ql.exec.DDLTask. Metaexception (Message:org.apache.hadoop.hbase.client.RetriesExhaustedException:Can ' t get the locations at Org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations ( rpcretryingcallerwithreadreplicas.java:312) at Org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call ( scannercallablewithreplicas.java:153) at Org.apache.hadoop.hbase.clieNt. Scannercallablewithreplicas.call (scannercallablewithreplicas.java:61) at Org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries (rpcretryingcaller.java:200) at Org.apache.hadoop.hbase.client.ClientScanner.call (clientscanner.java:320) at Org.apache.hadoop.hbase.client.ClientScanner.nextScanner (clientscanner.java:295) at Org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction (clientscanner.java:160) at Org.apache.hadoop.hbase.client.clientscanner.<init> (clientscanner.java:155) at Org.apache.hadoop.hbase.client.HTable.getScanner (htable.java:811) at Org.apache.hadoop.hbase.MetaTableAccessor.fullScan (metatableaccessor.java:602) at Org.apache.hadoop.hbase.MetaTableAccessor.tableExists (metatableaccessor.java:366) at Org.apache.hadoop.hbase.client.HBaseAdmin.tableExists (hbaseadmin.java:303) at Org.apache.hadoop.hbase.client.HBaseAdmin.tableExists (hbaseadmin.java:313) at Org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable (Hbasestoragehandler.java:205) at Org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable (
hivemetastoreclient.java:742)
Can not get location and the following log should be not connected to the HBase, with JPS to see the operation of the HBase
[Root@master conf]# JPS
3945 hmaster 18681 runjar 2699 namenode 3330 nodemanager
2951 Secondarynamenode
3226 ResourceManager
3874 hquorumpeer
18901
Found everything was okay.
Next check the Hive Log discovery: All zookeeper Connection failed
Opening socket connection to server localhost/127.0.0.1:2181
2017-05-25t03:06:12,259 INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-sendthread (localhost:2181)] zookeeper. Clientcnxn:opening socket connection to server localhost/127.0.0.1:2181. Would not attempt to authenticate using SASL (unknown error) 2017-05-25t03:06:12,260 WARN [9b1835d1-6488-4521-99d5-88d3e78 6be46 Main-sendthread (localhost:2181)] zookeeper. Clientcnxn:session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect Nectexception:connection refused at Sun.nio.ch.SocketChannelImpl.checkConnect (Native method) at Sun.nio.ch.SocketChannelImpl.finishConnect (socketchannelimpl.java:739) at Org.apache.zookeeper.ClientCnxnSocketNIO.doTransport (clientcnxnsocketnio.java:361) at Org.apache.zookeeper.clientcnxn$sendthread.run (clientcnxn.java:1081) 2017-05-25t03:06:13,362 INFO [ 9b1835d1-6488-4521-99d5-88d3e786be46 Main-sendthread (localhost:2181)] zookeeper. Clientcnxn:opening socket connection to server localhost/127.0.0.1:2181. Would noT attempt to authenticate using SASL (unknown error) 2017-05-25t03:06:13,363 WARN [9b1835d1-6488-4521-99d5-88d3e786be46 m Ain-sendthread (localhost:2181)] zookeeper. Clientcnxn:session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect Nectexception:connection refused at Sun.nio.ch.SocketChannelImpl.checkConnect (Native method) at Sun.nio.ch.SocketChannelImpl.finishConnect (socketchannelimpl.java:739) at Org.apache.zookeeper.ClientCnxnSocketNIO.doTransport (clientcnxnsocketnio.java:361) at Org.apache.zookeeper.clientcnxn$sendthread.run (clientcnxn.java:1081) 2017-05-25t03:06:13,465 INFO [ 9b1835d1-6488-4521-99d5-88d3e786be46 Main-sendthread (localhost:2181)] zookeeper. Clientcnxn:opening socket connection to server localhost/127.0.0.1:2181. Would not attempt to authenticate using SASL (unknown error) 2017-05-25t03:06:13,466 WARN [9b1835d1-6488-4521-99d5-88d3e78 6be46 Main-sendthread (localhost:2181)] zookeeper. Clientcnxn:session 0x0 forServer NULL, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException:Connection re Fused at Sun.nio.ch.SocketChannelImpl.checkConnect (Native method) at Sun.nio.ch.SocketChannelImpl.finishConnect ( socketchannelimpl.java:739) at Org.apache.zookeeper.ClientCnxnSocketNIO.doTransport (clientcnxnsocketnio.java:361 ) at Org.apache.zookeeper.clientcnxn$sendthread.run (clientcnxn.java:1081) 2017-05-25t03:06:14,568 INFO [ 9b1835d1-6488-4521-99d5-88d3e786be46 Main-sendthread (localhost:2181)] zookeeper. Clientcnxn:opening socket connection to server localhost/127.0.0.1:2181. Would not attempt to authenticate using SASL (unknown error) 2017-05-25t03:06:14,569 WARN [9b1835d1-6488-4521-99d5-88d3e78 6be46 Main-sendthread (localhost:2181)] zookeeper.
Clientcnxn:session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
To see the Hive-site.xml configuration file, configure the
<property>
<name>hive.zookeeper.quorum</name>
<value>master,node1,node2</ value>
<description>
List of zookeeper servers to talk. This is needed for:
1. Read/write locks-when Hive.lock.manager is set
to Org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager,
2. When HiveServer2 supports service discovery via zookeeper.
3. For delegation token storage if zookeeper the store is used, if
Hive.cluster.delegation.token.store.zookeeper.connectString is not set
4. Llap Daemon Registry Service
</description>
</property>
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
<description>
The port of zookeeper servers to talk.
If the list of zookeeper servers specified in Hive.zookeeper.quorum
does not contain port numbers, this value is used.
</description>
</property>
The default port in the HBase cluster is 2222, so for port unification, change the hbase-site.xml to 2181, and remember to restart the service.
or copy hbase-site.xml directly to Hive's Conf directory, hive will read HBase zookeeper Zookeeper.quorum and Zookeeper.port
Both of these methods can solve the problem
Then execute the CREATE TABLE statement again in Hive
hive> CREATE EXTERNAL TABLE hbase_user (key string, Idcard string,passport string,country string,name \ String,password S Tring,
> Province string,city string,age string,sex string, id string)
> STORED by ' org.apache.hadoop.hive . HBase. Hbasestoragehandler '
> with serdeproperties ("hbase.columns.mapping" = ": Key,account:idcard,account:passport , Account:country,account:name,account:password,
> Address:province,address:city,info:age,info:sex,userid : id ")
> Tblproperties (" hbase.table.name "=" user ");
OK time
taken:20.323 seconds
hive> show tables;
OK
apachelog
hbase_user time
taken:2.75 seconds, Fetched:2 row (s)
Executed successfully,
Then use the HIVEQL to query the data
Hive> select * from Hbase_user; OK Lisi null 96857123123231 null Lisi 654321 Guangdong Shenzhen 38 female 002 Zhangsan null 968574321 null Zhangsan 123456 Guangdong Shenzhen 26 men 001 Ti
Me taken:5.798 seconds, Fetched:2 row (s) hive> describe Hbase_user;
OK Key String Idcard string
Passport string Country string
Name string Password string
Province String City string
Age String Sex string ID string time taken:3.785 seconds, fetched: One row (s) hive> select Key, Idcard,Password,country,name, passport,province,city,age,sex,id from Hbase_user; OK Lisi null 654321 null Lisi 96857123123231 Guangdong Shenzhen 38 002 zhangsan NULL 123456 Beijing Zhangsan 968574321 Guangdong Shenzhen 26 men 001 T IME taken:2.341 seconds, Fetched:2 row (s)
.. NULL because the column of HBase does not set the Idcard field value, and the value of country is null
Give HBase table user set country look, and Idcard
./hbase Shell
HBase (main):003:0> put ' user ', ' zhangsan ', ' account:idcard ', ' 420923156366998855 ';
HBase (Main): 004:0* put ' user ', ' Lisi ', ' account:idcard ', ' 520369856366998855 ';
HBase (Main): 005:0* put ' user ', ' Lisi ', ' account:country ', ' she ';
Hive> Select Key, Idcard,password,country,name, passport,province,city,age,sex,id from Hbase_user; OK Lisi 520369856366998855 654321 Lisi 96857123123231 Guangdong Shenzhen 38 Women 002 Zhangsan 420923156366998855 123456 An 968574321 Guangdong Shenzhen 26 men 001 Time taken:2.388 seconds, Fetched:2 row (s) hive> select * from Hbase_user where name= ' en
Angsan ';
OK Zhangsan 420923156366998855 968574321 zhangsan 123456 Guangdong Shenzhen 26 men 001 Time taken:2.651 seconds, fetched:1 row (s)
Hive> Select COUNT (key) from Hbase_user; WARNING:HIVE-ON-MR is deprecated in Hive 2 and may are available in the future versions.
Consider using a different execution engine (i.e. Tez, Spark) or using Hive 1.X releases. Query ID = root_20170525040249_f808c765-79f6-43c0-aa94-ebfed7751091 Total jobs = 1 Launching Job 1 out of 1 number of redu Ce tasks determined at compile time:1 the average load for a reducer (in bytes): Set Hive.exec.reduc Ers.bytes.per.reducer=<number> in order to LiMIT the maximum number of Reducers:set hive.exec.reducers.max=<number> in order to set a constant number of reduc Ers:set mapreduce.job.reduces=<number> Starting job = JOB_1495621107567_0001, tracking URL = http://master:8088/p
Roxy/application_1495621107567_0001/kill Command =/usr/tools/hadoop/bin/hadoop Job-kill job_1495621107567_0001 Hadoop Job information for Stage-1: number of mappers:1; Number of reducers:1 failed:execution Error, return code-101 from Org.apache.hadoop.hive.ql.exec.mr.MapRedTask. PermGen space
When executing count, run Mapreducer,permgen space .... and drunk.
The WINDOW7 machine runs a virtual machine, 3 Linux hadoop,hbase,hive clusters, 8g memory memory usage goes horizontal,.....
In fact, sometimes it is possible to count out the data, and it is estimated that Windows has many processes