"Gandalf" Hive 0.13.1 on Hadoop2.2.0 + oracle10g deployment Detailed explanation

Last Update:2015-07-27 Source: Internet

Author: User

Tags deprecated file copy log4j

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Environment:Hadoop2.2.0hive0.13.1ubuntu 14.04 Ltsjava Version "1.7.0_60"
oracle10g * * * Welcome reprint. Please indicate source * * *
http://blog.csdn.net/u010967382/article/details/38709751 Download the installation package at the following addressHttp://mirrors.cnnic.cn/apache/hive/stable/apache-hive-0.13.1-bin.tar.gz
The installation package is extracted to the server/home/fulong/hive/apache-hive-0.13.1-bin
Change the environment variable, add the following contentExport Hive_home=/home/fulong/hive/apache-hive-0.13.1-binExport path= $HIVE _home/bin: $PATH go to the Conf folder to rename the copy template configuration file[Email protected]:~/hive/apache-hive-0.13.1-bin/conf$ lshive-default.xml.template Hive-exec-log4j.properties.templatehive-env.sh.template Hive-log4j.properties.template[email protected]:~/Hive/ apache-hive-0.13.1-bin/conf$CP hive-env.sh.template hive-env.sh[Email protected]:~/hive/apache-hive-0.13.1-bin/conf$CP hive-default.xml.template Hive-site.xml[Email protected]:~/hive/apache-hive-0.13.1-bin/conf$ lshive-default.xml.template hive-env.sh.template hi Ve-log4j.properties.templatehive-env.shHive-exec-log4j.properties.templateHive-site.xml change the following places in the configuration file hive-env.sh. Separate the root folder for Hadoop, Hive's conf and Lib folders# Set Hadoop_home to point to a specific HADOOP install directoryHadoop_home=/home/fulong/hadoop/hadoop-2.2.0
# Hive Configuration Directory can be controlled by:Export Hive_conf_dir=/home/fulong/hive/apache-hive-0.13.1-bin/conf
# Folder containing extra ibraries required for hive Compilation/execution can is controlled by:Export Hive_aux_jars_path=/home/fulong/hive/apache-hive-0.13.1-bin/lib Change the following number of connections in the configuration file hive-site.sh to Oracle related parameters <property> <name>javax.jdo.option.ConnectionURL</name> <value>JDBC:ORACLE:THIN:@192.168.0.138:1521:ORCL</value> &LT;DESCRIPTION&GT;JDBC connect string for a JDBC metastore</description></property>
<property> <name>javax.jdo.option.ConnectionDriverName</name> <value> Oracle.jdbc.driver.OracleDriver</value> <description>driver class name for a JDBC metastore</description></property>
<property> <name>javax.jdo.option.ConnectionUserName</name> <value> Hive</value> <description>username to use against Metastore database</description></property>
<property> <name>javax.jdo.option.ConnectionPassword</name> <value> HIVEFBI</value> <description>password to use against Metastore database</description></property>
Configure log4jCreate log4j folder under $hive_home to store log file copy template rename [email protected]:~/hive/apache-hive-0.13.1-bin/conf$ cp Hive-log4j.properties.templatehive-log4j.properties
Change the folder where the log is storedHive.log.dir=/home/fulong/hive/apache-hive-0.13.1-bin/log4j Copy the Oracle JDBC Jar packageCopy the appropriate Oracle JDBC package to $hive_home/lib Start Hive[email protected]:~/hive/apache-hive-0.13.1-bin$ hive14/08/20 17:14:05 INFO configuration.deprecation: Mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces14/08/20 17:14:05 INFO Configuration.deprecation:mapred.min.split.size is Deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize14/08/20 17:14:05 INFO configuration.deprecation: Mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative14/08/20 17:14:05 INFO configuration.deprecation: Mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node14/08/20 17:14:05 INFO configuration.deprecation : Mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive14/08/20 17:14:05 INFO configuration.deprecation: Mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack14/08/20 17:14:05 INFO configuration.deprecation: Mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize14/08/20 17:14:05 INFO configuration.deprecation: Mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use MAPREDUCE.JOB.COMMITTER.SETUP.CLEANUP.NEEDED14/08/20 17:14:05 WARN conf. Hiveconf:deprecated:hive.metastore.ds.retry.* no longer have any effect. use hive.hmshandler.retry.* instead
Logging initialized using configuration in file:/home/fulong/hive/apache-hive-0.13.1-bin/conf/ Hive-log4j.propertiesjava HotSpot (TM) 64-bit Server VM warning:you has loaded library/home/fulong/hadoop/hadoop-2.2.0 /lib/native/libhadoop.so which might has disabled stack guard. The VM would try to fix the stack guard now. It's highly recommended that's the library with ' execstack-c <libfile> ', or link it with '-Z noexecstack '. HIV E> ValidationPlan to create a table store Sogou lab Download user search behavior log.

Data: http://www.sogou.com/labs/dl/q.html

first create the table:Hive>CREATE TABLE Searchlog (Time string,id string,sword string,rank int,clickrank int,url string) row format Delimited fields terminated by ' \ t ' lines terminated by ' \ n ' stored as textfile;
this will cause an error:failed:execution Error, return code 1 from Org.apache.hadoop.hive.ql.exec.DDLTask. Metaexception (Message:javax.jdo.JDODataStoreException:An exception is thrown while Adding/validating class (es): Ora-01754:a table contain only one column of type LONG
Workaround: Open the Hive-metastore-0.13.0.jar in ${hive_home}/lib with the Unzip tool and discover a file named Package.jdo. Open the file and locate the following content.<field name= "vieworiginaltext" default-fetch-group= "false" ><column name= "View_original_text" jdbc-type= "longvarchar"/></field><field name= "viewexpandedtext" default-fetch-group= "false" ><column name= "View_expanded_text" jdbc-type= "longvarchar"/></field>The ability to discover the types of columns View_original_text and View_expanded_text is LongVarChar, corresponding to long in Oracle, which contradicts the requirement that Oracle tables only have a column of type long. So there was a mistake.

Change the value of the jdbc-type of the two columns to CLOB according to the recommendations of the Hive website. The contents of the changes, as seen below. <field name= "Vieworiginaltext" default-fetch-group= "false" > <column name= "View_original_text" jdbc-type= " CLOB "/> </field> <field name= "Viewexpandedtext" default-fetch-group= "false" > <column name= "View_expanded_text" jdbc-type= " CLOB "/> </field>
After the change, restart hive.

Run the command to create the table again. CREATE TABLE succeeded:
hive> CREATE TABLE Searchlog (Time string,id string,sword string,rank int,clickrank int,url string) row format Delimited fields terminated by ' \ t ' lines terminated by ' \ n ' stored as textfile; OKTime taken:0.986 seconds
load the local data into the table:hive> Load data local inpath '/home/fulong/downloads/sogouq.reduced ' overwrite into table searchlog; Copying data from file:/home/fulong/downloads/sogouq.reducedCopying file:file:/home/fulong/downloads/sogouq.reducedLoading data to table Default.searchlogrmr:DEPRECATED:Please use ' rm-r ' instead.Deleted Hdfs://fulonghadoop/user/hive/warehouse/searchlogTable default.searchlog Stats: [Numfiles=1, Numrows=0, totalsize=152006060, rawdatasize=0]OKTime taken:25.705 seconds
View all tables:hive> show tables; OKSearchlogTime taken:0.139 seconds, fetched:1 row (s)
Count rows:hive> Select COUNT (*) from Searchlog; Total jobs = 1launching Job 1 out of 1Number of reduce tasks determined at compile time:1In order to change the average load for a reducer (in bytes):Set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:Set hive.exec.reducers.max=<number>In order to set a constant number of reducers:Set mapreduce.job.reduces=<number>Starting Job = job_1407233914535_0001, Tracking URL = http://FBI003:8088/proxy/application_1407233914535_0001/ Kill Command =/home/fulong/hadoop/hadoop-2.2.0/bin/hadoop Job-kill job_1407233914535_0001Hadoop Job information for Stage-1: number of mappers:1; number of reducers:12014-08-20 18:03:17,667 Stage-1 map = 0, reduce = 0%2014-08-20 18:04:05,426 Stage-1 map = 100%, reduce = 0, Cumulative CPU 3.46 sec2014-08-20 18:04:27,317 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.74 secMapReduce Total Cumulative CPU time:4 seconds 740 msecEnded Job = job_1407233914535_0001MapReduce Jobs Launched:Job 0:map:1 reduce:1 Cumulative cpu:4.74 sec HDFs read:152010455 HDFs write:8 SUCCESSTotal MapReduce CPU time spent:4 seconds 740 msecOK1724264Time taken:103.154 seconds, fetched:1 row (s)

"Gandalf" Hive 0.13.1 on Hadoop2.2.0 + oracle10g deployment Detailed explanation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More