Hive (v): Hive and HBase integration

Source: Internet
Author: User

The purpose of configuring hive and HBase integration is to use the HQL syntax to implement additions and deletions of hbase database, the basic principle is to use the API interface between them to communicate with each other, the communication is mainly dependent on the Hive_hbase-handler.jar tool class. Note, however, that using hive to manipulate tables in HBase simply provides convenience, as described in the previous section, where the HIVEQL engine uses MapReduce, which performs poorly on performance and can be used for different scenarios in the actual application process.

Note: the content described in this article applies to the version shown in my previous section, HDP2.4.2 (HBase-1.1.2, hive-1.2.1, hadooop-2.7.1), the path of the article command may vary depending on the installation path you configure the cluster selection, Please adjust according to the actual installation directory.

Directory:

    • Hive Configuration
    • DFS permissions
    • Test

Hive Configuration:

  • The following files are required in the -incubating.jar ) is already included in the hive installation, and other files in the list require copy from note: If the file version that comes with the hive installation is inconsistent with hbase/lib, you should delete the file under Hive/lib and copy from Hbase/lib.)
     guava-14.0.1.jarzookeeper- 3.4.6.2.4.2.0-258 .jarhtrace-core- 3.1.0-incubating.jarhbase-common-  1.1.2.2.4.2.0-258. Jarhbase-common-1.1.2.2.4.2.0-258-tests.jarhbase-client-1.1.2.2.4.2.0-258.jarhbase-server-1.1.2.2.4.2.0-258.jarhbase-prot ocol-1.1.2.2.4.2.0-258  .jar  hive-hbase-handler- 1.2.1000.2.4.2.0-258.jar 
  • On the HBase Master host HDP4, execute the following command to copy the files to the
  • HDP4 command:cd/usr/hdp/2.4.2.0-258/hbase/lib
  • HDP4 command:SCP Hbase-common-1.1.2.2.4.2.0-258.jar Hbase-common-1.1.2.2.4.2.0-258-tests.jar Hbase-client-1.1.2.2.4.2.0-258.jar Hbase-server-1.1.2.2.4.2.0-258.jar Hbase-protocol-1.1.2.2.4.2.0-258.jar HDP1:/usr/hdp/2.4.2.0-258/hive/lib (execute the above command again, modify the red label machine name update file to HDP2,HDP3)
  • Modify the Hive-site.xml configuration file in the Ambari management interface, hive--and advanced--and custom Hive-site, select " Add Property ... ", Pop-up box: Key input:Hive.aux.jars.path, value is the
    /usr/hdp/2.4.2.0-258/hive/lib/guava-14.0.1.jar,/usr/hdp/2.4.2.0-258/hive/zookeeper-3.4.6.2.4.2.0-258.jar,/usr/ hdp/2.4.2.0-258/hive/hive-hbase-handler-1.2.1000.2.4.2.0-258.jar,/usr/hdp/2.4.2.0-258/hive/ hbase-common-1.1.2.2.4.2.0-258.jar,/usr/hdp/2.4.2.0-258/hive/hbase-server-1.1.2.2.4.2.0-258.jar,/usr/hdp/ 2.4.2.0-258/hive/hbase-client-1.1.2.2.4.2.0-258.jar,/usr/hdp/2.4.2.0-258/hive/ hbase-common-1.1.2.2.4.2.0-258-tests.jar,/usr/hdp/2.4.2.0-258/hive/htrace-core-3.1.0-incubating.jar,/usr/hdp/ 2.4.2.0-258/hive/hbase-protocol-1.1.2.2.4.2.0-258.jar
  • The last step is actually to add a parameter to the Hive-site.xml (/etc/hive/2.4.2.0-258/0) configuration file, and not allow manual modification because the manually modified configuration will be flushed out after the hive service restarts.
  • When the parameters are modified in Ambari and saved, a new configuration version is generated, and the effect of this parameter modification on other hosts and components is automatically detected, prompting the component service to restart and follow the instructions.
  • Copy the Hbase-site.xml file under Hbase/conf on the HDP4 host to the hadoop/conf of all Hadoop nodes

DFS permissions:

  • < Span style= "Font-family:simsun;" > Go to the Ambari management interface and select HDFS--Advanced--and advanced Hdfs-site, set property is: false 

Test:

  • Use Xshell to connect to the hive host HDP1
  • Command: cd/usr/hdp/2.4.2.0-258/hive/bin (switch to Hive/bin directory)
  • Single node Connection hbase command: beeline-hiveconf hbase.master=hdp4.60000 (HDP4 is the master node of HBase, see hd2.4 Installation part fifth)
  • Connect with the HBase command in a cluster:beeline-hiveconf hbase.zookeeper.quorum=r (meaning that the host is allocated by zookeeper, HBase uses the zookeeper cluster, the default port is 2181 , which can be unspecified if the zookeeper port has been modified: zknode1:2222,zknode2:2222,zknode3:2222)
  • Beeline: !connect jdbc:hive2://hdp1:10000/default (connect hive)
  • Test: show databases; (View all databases in hive to verify that the hive connection was successful)
  • Execute the following SQL, which creates a hive extension table associated with the HBase database Stocksinfo table. (See HBase (b): C # visit HBase Stock Quotes Demo)
    CREATEEXTERNALTABLE if  not existsStocksinfo (Rowkey string, Code string, Name string) STORED by 'Org.apache.hadoop.hive.hbase.HBaseStorageHandler'   withSerdeproperties ('hbase.columns.mapping' = ': Key,d:code,d:name') tblproperties ('Hbase.table.name' = 'Stocksinfo');
  • hbase.columns.mapping: Field mappings for hive tables and HBase tables: the first field mapping in the Hive table: Key (Rowkey), D:code (d refers to the HBase database Stocksinfo Table column family, code column name)
  • After the table is successfully created, execute: select * from Stocksinfo; check it. Ok
  • The Hive extension table can be understood as a view in a relational database, and when the table is deleted, the data is not deleted, and most of my scenarios are associated with hbase data during the application.
  • If you create a hive real table associated with hbase, creating tables through HiveQL, adding data, and so on will affect the data in the HBase database, as described in a later section.

Hive (v): Hive and HBase integration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.