Import hive data to hbase

Source: Internet
Author: User

Version Description: hive-0.13.1

Hbase-0.96.0/hbase-0.98.2

Step 1: Execute the hive-related hbase table creation Statement on the hive Client

Hive_user_info
User_info table in hbase

Create Table hive_user_info (
A string, B string, C string,
D string, E string,
F String, G String)
Partitioned by (DT string)
Stored by 'org. Apache. hadoop. hive. hbase. hbasestoragehandler' with serdeproperties
("Hbase. Columns. Mapping" = ": Key, Info: B, Info: C, Info: D, Info: E, Info: E, Info: F ")
Tblproperties ("hbase. Table. Name" = "user_info"); it seems that the default version of this version is 1.

So set version on hbase shell.
Alter 'user _ info', {name => 'info', 'version' => 3}

Step 2: query data from an existing hive table and insert it to the hive_user_info table.

Insert into Table hive_user_info partition (Dt = 1) Select udid, if (jailbreak = 0, 1), Concat (DT, '', hour, ':', time_minute), 0, device_id, '2', null from click_log;

This pitfall problem: Debug debugging, or no problem found, hive execution plan is no problem, go to view-ext-10000 log


However, adding limit is okay:

Insert into Table hive_user_info partition (Dt = 1) Select udid, if (jailbreak = 0, 1), Concat (DT, '', hour, ':', time_minute), 0, device_id, '2', null from click_log limit 10000;

Add limit to view the explain SQL statement. That is to say, the data obtained by Ming hive is written to the hive and hbase associated tables using a reduce statement. This common sense does not conform to the distributed hadoop theory. Continue to find the cause:

Hive-hiveconf hive. Root. Logger = debug, console

This class is not found. Continue to compile this class for testing and find that this class is not a problem because of this:

An error occurred while parsing the SQL statement.
SQL statements with reduce are parsed
Org. Apache. hadoop. hive. QL. Plan. tablescandesc
Org. Apache. hadoop. hive. QL. Plan. reducesinkdesc
Org. Apache. hadoop. hive. QL. Plan. extractdesc
Org. Apache. hadoop. hive. QL. Plan. ptfdesc
Org. Apache. hadoop. hive. QL. Plan. selectdesc
Org. Apache. hadoop. hive. QL. Plan. filesinkdesc
SQL statements without reduce are parsed
Org. Apache. hadoop. hive. QL. Plan. tablescandesc
Org. Apache. hadoop. hive. QL. Plan. selectdesc
Org. Apache. hadoop. hive. QL. Plan. filesinkdesc

The information displayed on the earth is stored in org. Apache. hadoop. hive. QL. Plan. filesinkdesc, but the hbase package information is not included after the SQL statement without reduce is parsed.
If a third-party package is not used, it will not be parsed into Org. Apache. hadoop. hive. QL. Plan. filesinkdesc.
So there is no problem with using hive. Continue to look for hbase.

After two days and two nights of fighting, I finally found the problem!


This error was found in info !!!

If a map-only task is triggered, it is called a conditional task to merge small files. The task is divided into N small tasks to determine whether the merge is successful,
When checking the task of merging small files, check the partition. At this time, the output table is replaced with an input table, and the input table does not contain the table's custom information.
So I rushed out the class of the output table... Rushed out !!!!
Solution: Turn off merge...

Set hive. Merge. mapfiles = false

Set hive. merge. mapfiles = false; insert into Table hive_user_info partition (Dt = '$ {date}') Select udid, if (jailbreak = 0, 1), Concat (DT, '', hour, ':', time_minute), 0, device_id, '2', null from show_log where dt = '$ {date}' and udid! = 'Null' and udid! = "";


This problem found in hive-0.13.0 and integration testing for hbase-0.96.0/hbase-0.98.2 versions

But there's no problem with hive-0.11.0 and hbase-0.94.0 versions.

Experience analysis: when using a later version of The hadoop component

First, check the official update documents to learn about new features and compare the changes made in earlier versions.

Second, when using the hadoop component, you must learn debug to view the log. If error and warning cannot see the problem, continue to read info,

Finally, I learned to compile the source code package and troubleshoot the errors one by one ...... (To be continued)

Compile hive jar package command: MVN clean compile-Phadoop-2

Key configuration file hive/Cong/

Method 1: (when hadoop is started, load the required jar package to HDFS)

1, first remove the hive_aux_jars_path in the hive-site.xml
2, in the hive-env.sh with export hive_aux_jars_path =/home/yudaer/hbase-0.98.6.1-hadoop2/lib/followed by hbase lib address

Method 2: directly read hive/lib and the jar package in the accompanying file when hive is started.

<Property>
<Name> hive. Aux. jars. Path </Name>
<Value> file: // usr/local/hive-0.13.0/lib/hive-hbase-handler-0.13.1.jar, file: // usr/local/hive-0.13.0/lib/protobuf-java-2.5.0.jar, file: /// usr/local/hive-0.13.0/lib/hbase-client-0.96.2-hadoop2.jar, file: // usr/local/hive-0.13.0/lib/hbase-common-0.96.2-hadoop2.jar, file: /// usr/local/hive-0.13.0/lib/hbase-common-0.96.2-hadoop2-tests.jar, file: // usr/local/hive-0.13.0/lib/hbase-protocol-0.96.2-hadoop2, file: /// usr/local/hive-0.13.0/lib/hbase-server-0.96.2-hadoop2, file: // usr/local/hive-0.13.0/lib/htrace-core-2.04, file: /// usr/local/hive-0.13.0/lib/zookeeper-3.4.5.jar, file: // usr/local/hive-0.13.0/lib/guava-12.0.1.jar </value>
</Property>

Note: <value> the intermediate value must be written in one row </value>

I feel that method 1 is more professional (-d) and more convenient (not easy to make mistakes)


Import hive data to hbase

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.