Step 1: Execute the hive-related hbase table creation Statement on the hive Client
Step 2: query data from an existing hive table and insert it to the hive_user_info table.
Insert into Table hive_user_info partition (Dt = 1) Select udid, if (jailbreak = 0, 1), Concat (DT, '', hour, ':', time_minute), 0, device_id, '2', null from click_log;
This pitfall problem: Debug debugging, or no problem found, hive execution plan is no problem, go to view-ext-10000 log
However, adding limit is okay:
Insert into Table hive_user_info partition (Dt = 1) Select udid, if (jailbreak = 0, 1), Concat (DT, '', hour, ':', time_minute), 0, device_id, '2', null from click_log limit 10000;
Add limit to view the explain SQL statement. That is to say, the data obtained by Ming hive is written to the hive and hbase associated tables using a reduce statement. This common sense does not conform to the distributed hadoop theory. Continue to find the cause:
Hive-hiveconf hive. Root. Logger = debug, console
This class is not found. Continue to compile this class for testing and find that this class is not a problem because of this:
An error occurred while parsing the SQL statement.
SQL statements with reduce are parsed
Org. Apache. hadoop. hive. QL. Plan. tablescandesc
Org. Apache. hadoop. hive. QL. Plan. reducesinkdesc
Org. Apache. hadoop. hive. QL. Plan. extractdesc
Org. Apache. hadoop. hive. QL. Plan. ptfdesc
Org. Apache. hadoop. hive. QL. Plan. selectdesc
Org. Apache. hadoop. hive. QL. Plan. filesinkdesc
SQL statements without reduce are parsed
Org. Apache. hadoop. hive. QL. Plan. tablescandesc
Org. Apache. hadoop. hive. QL. Plan. selectdesc
Org. Apache. hadoop. hive. QL. Plan. filesinkdesc
The information displayed on the earth is stored in org. Apache. hadoop. hive. QL. Plan. filesinkdesc, but the hbase package information is not included after the SQL statement without reduce is parsed.
If a third-party package is not used, it will not be parsed into Org. Apache. hadoop. hive. QL. Plan. filesinkdesc.
So there is no problem with using hive. Continue to look for hbase.
After two days and two nights of fighting, I finally found the problem!
This error was found in info !!!
If a map-only task is triggered, it is called a conditional task to merge small files. The task is divided into N small tasks to determine whether the merge is successful,
When checking the task of merging small files, check the partition. At this time, the output table is replaced with an input table, and the input table does not contain the table's custom information.
So I rushed out the class of the output table... Rushed out !!!!
Solution: Turn off merge...
Set hive. Merge. mapfiles = false
Set hive. merge. mapfiles = false; insert into Table hive_user_info partition (Dt = '$ {date}') Select udid, if (jailbreak = 0, 1), Concat (DT, '', hour, ':', time_minute), 0, device_id, '2', null from show_log where dt = '$ {date}' and udid! = 'Null' and udid! = "";
This problem found in hive-0.13.0 and integration testing for hbase-0.96.0/hbase-0.98.2 versions
But there's no problem with hive-0.11.0 and hbase-0.94.0 versions.
Experience analysis: when using a later version of The hadoop component
First, check the official update documents to learn about new features and compare the changes made in earlier versions.
Second, when using the hadoop component, you must learn debug to view the log. If error and warning cannot see the problem, continue to read info,
Finally, I learned to compile the source code package and troubleshoot the errors one by one ...... (To be continued)
Compile hive jar package command: MVN clean compile-Phadoop-2
Key configuration file hive/Cong/
Method 1: (when hadoop is started, load the required jar package to HDFS)
1, first remove the hive_aux_jars_path in the hive-site.xml
2, in the hive-env.sh with export hive_aux_jars_path =/home/yudaer/hbase-0.98.6.1-hadoop2/lib/followed by hbase lib address
Method 2: directly read hive/lib and the jar package in the accompanying file when hive is started.
<Property>
<Name> hive. Aux. jars. Path </Name>
<Value> file: // usr/local/hive-0.13.0/lib/hive-hbase-handler-0.13.1.jar, file: // usr/local/hive-0.13.0/lib/protobuf-java-2.5.0.jar, file: /// usr/local/hive-0.13.0/lib/hbase-client-0.96.2-hadoop2.jar, file: // usr/local/hive-0.13.0/lib/hbase-common-0.96.2-hadoop2.jar, file: /// usr/local/hive-0.13.0/lib/hbase-common-0.96.2-hadoop2-tests.jar, file: // usr/local/hive-0.13.0/lib/hbase-protocol-0.96.2-hadoop2, file: /// usr/local/hive-0.13.0/lib/hbase-server-0.96.2-hadoop2, file: // usr/local/hive-0.13.0/lib/htrace-core-2.04, file: /// usr/local/hive-0.13.0/lib/zookeeper-3.4.5.jar, file: // usr/local/hive-0.13.0/lib/guava-12.0.1.jar </value>
</Property>
Note: <value> the intermediate value must be written in one row </value>
I feel that method 1 is more professional (-d) and more convenient (not easy to make mistakes)