Hive's installation configuration using Tez

Source: Internet
Author: User
Tags xsl port number tomcat
To more efficiently run dependent jobs (such as the mapreduce jobs generated by pig and hive), reduce disk and network Io,hortonworks developed the DAG Computing Framework Tez.

Tez is a general-purpose DAG Computing framework evolved from the MapReduce computing framework and can be used as the underlying data processing engine for systems such as mapreducer/pig/hive, which is inherently integrated into the resource management platform yarn in Hadoop 2.0, and is created by Hadoop 2.0 core personnel to create, will become a computing framework of the rising star

Required parts of the Library and toolkit gcc make gcc-c++ OpenSSL which has two phantomjs-2.1.1-linux-x86_64 and Nodejs installations will waste a little time

Download Tez source code to extract the compilation after the official website

Note Change the version of Hadoop in POM or set your own Hadoop versions in MVN

MVN package-dhadoop.version=2.7.2-dskiptests-dmaven.javadoc.skip=true
1. Using the tez-0.8.4-minimal.tar.gz in tez-dist/target/, unzip the/opt/single/tez locally,

Build conf under $tez_home, create Tez-site.xml

<?xml version= "1.0" encoding= "UTF-8"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>
<configuration>
	<property>
		<name>tez.lib.uris</name>
		<value> hdfs://hadoop:9000/apps/tez-0.8.4/tez-0.8.4-minimal.tar.gz</value>
	</property>
	< property>
		<name>tez.use.cluster.hadoop-libs</name>
		<value>true</value>
	</property>
</configuration>
2. Setting environment variables for Linux
Export Tez_home=/opt/single/tez
export tez_conf_dir= $TEZ _home/conf
export tez_jars= $TEZ _home
3. Add the following in hadoop-env.sh:
Export hadoop_classpath= $HADOOP _classpath: $TEZ _conf_dir: $TEZ _jars/*: $TEZ _jars/lib/*
Mapred-size.xml settings
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn-tez</value>
	</property>
4. Launch Hadoop to upload the compiled tez-0.8.4-minimal.tar.gz to the hdfs://hadoop:9000/apps/tez-0.8.4/directory

5. The settings for the Tez UI are as follows:

Add in Yarn-site.xml:

<property>
		<name>yarn.timeline-service.enabled</name>
		<value>true</value >
	</property>
	<property>
		<name>yarn.timeline-service.hostname</name>
		<value>hadoop</value>
	</property>
	<property>
		<name> yarn.timeline-service.http-cross-origin.enabled</name>
		<value>true</value>
	</ property>
	<property>
		<name>yarn.resourcemanager.system-metrics-publisher.enabled</ name>
		<value>true</value>
	</property>
	<property>
		<name> Yarn.timeline-service.webapp.address</name>
		<value>${yarn.timeline-service.hostname}:8188< /value>
	</property>
	<property>
		<name> Yarn.timeline-service.webapp.https.address</name>
		<value>${yarn.timeline-service.hostname} :2191</value>
	</property>
Add in Tez-site.xml:

	<property>
		<description>enable Tez to use the Timeline Server for history logging</description>< C2/><name>tez.history.logging.service.class</name>
		<value> org.apache.tez.dag.history.logging.ats.atshistoryloggingservice</value>
	</property>

	< Property>
		<!--The port number of the Tomcat you installed--
		<name>tez.tez-ui.history-url.base</name>
		<value>http://hadoop:8008/tez-ui/</value>
	</property>

	<property>
		< Name>tez.runtime.convert.user-payload.to.history-text</name>
		<value>true</value>
	</property>

	<property>
		<name>tez.task.generate.counters.per.io</name>
		<value>true</value>
	</property>
6.tomcat configuration:

Install Tomcat here omitted, many online

Then extract the Tez-ui-0.8.4.war,tez-ui2-0.8.4.war into Tomcat's webapps/directory

Mkdir-pv/opt/modules/tomcat-7.0.69/webapps/tez-ui  /opt/modules/tomcat-7.0.69/webapps/tez-ui2
cp/opt/ Single/tez/tez-ui-0.8.4.war/opt/modules/tomcat-7.0.69/webapps/tez-ui
Cp/opt/single/tez/tez-ui2-0.8.4.war/ Opt/modules/tomcat-7.0.69/webapps/tez-ui2
jar xvf Tez-ui-0.8.4.war
jar XVF Tez-ui2-0.8.4.war
Configuring the Webapps/tez-ui/scripts/config.js File
Timelinebaseurl: ' http://hadoop:8188 ',
rmweburl: ' http://hadoop:8088 ',
Tomcat set Port: 8008

/opt/modules/tomcat-7.0.69/conf/server.xml
     <connector port= "8008" protocol= "http/1.1
               " connectiontimeout= "20000"
               redirectport= "8443"/>
7. Test:

Start

start-dfs.sh start-yarn.sh yarn-daemon.sh start timelineserver startup.sh Hadoop jar/opt/single/tez/ Tez-tests-0.8.4.jar testorderedwordcount/data/data1/output2 16/08/27 00:33:27 INFO Shim. Hadoopshimsloader:trying to locate Hadoopshimprovider for hadoopversion=2.7.2, majorversion=2, minorVersion=7 16/08/27 00:33:27 INFO Shim. hadoopshimsloader:picked Hadoopshim Org.apache.tez.hadoop.shim.HadoopShim26, providername= Org.apache.tez.hadoop.shim.HadoopShim25_26_27Provider, Overrideproviderviaconfig=null, hadoopversion=2.7.2, majorversion=2, minorversion=7 16/08/27 00:33:28 INFO client. Tezclientutils:permissions on staging directory hdfs://hadoop:9000/tmp/hadoop/tez/staging/1472229207999 is Incorrect:rwxr-xr-x. Fixing permissions to correct value rwx------16/08/27 00:33:28 INFO examples. Testorderedwordcount:creating Tez Session 16/08/27 00:33:28 INFO client. Tezclient:tez Client Version: [Component=tez-api, version=0.8.4, revision=${buildnumber}, scm-url=scm:git:https:// Git-wip-us.aPache.org/repos/asf/tez.git, buildtime=2016-08-25t08:17:01z] 16/08/27 00:33:28 INFO impl. Timelineclientimpl:timeline Service ADDRESS:HTTP://LOCALHOST:8188/WS/V1/TIMELINE/16/08/27 00:33:28 INFO client. Rmproxy:connecting to ResourceManager at hadoop/192.168.0.3:8032 16/08/27 00:33:28 INFO client. Tezclient:using Org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to manage Timeline ACLs 16/08/27 00:33:28 INFO Impl. Timelineclientimpl:timeline Service ADDRESS:HTTP://LOCALHOST:8188/WS/V1/TIMELINE/16/08/27 00:33:28 INFO client. Tezclient:session mode.
Starting session. 16/08/27 00:33:28 INFO Client. Tezclientutils:using Tez.lib.uris value from configuration:hdfs://hadoop:9000/apps/tez-0.8.4/ tez-0.8.4-minimal.tar.gz 16/08/27 00:33:28 INFO client. Tezclientutils:using Tez.lib.uris.classpath value from Configuration:null 16/08/27 00:33:29 INFO client. Tezclient:tez System Stage Directory Hdfs://hadoop:9000/tmp/hadoop/tez/staging/1472229207999/.tez/application_ 1472222203999_0005 doesn ' t exist and is created 16/08/27 00:33:29 INFO ACLs.
atshistoryaclpolicymanager:created Timeline Domain for the history ACLs, domainid=tez_ats_application_1472222203999_0005 16/08/27 00:33:29 INFO Impl. yarnclientimpl:submitted application application_1472222203999_0005 16/08/27 00:33:29 INFO client. Tezclient:the URL to track the Tez session:http://hadoop:8088/proxy/application_1472222203999_0005/16/08/27 00:33:29 I NFO examples. Testorderedwordcount:running Orderedwordcount DAG, Dagindex=1, inputpath=/data/data1, OutputPath=/output2 16/08/27 00:33:29 INFO Examples. Testorderedwordcount:checking DAG specific ACLS 16/08/27 00:33:29 INFO examples. Testorderedwordcount:waiting for tezsession to get into ready state 16/08/27 00:33:32 INFO examples. Testorderedwordcount:submitting DAG to Tez Session, dagindex=1 16/08/27 00:33:32 INFO client. Tezclient:submitting dag to Tezsession, Sessionname=orderedwordcountsession, Applicationid=application_ 1472222203999_0005, DAGNAME=ORDEREDWORDCOunt1, callercontext={Context=tez, Callertype=testorderedwordcount, callerid=application_1472222203999_0005_1} 16/ 08/27 00:33:33 INFO Client. tezclient:submitted dag to Tezsession, Sessionname=orderedwordcountsession, applicationid=application_1472222203999 _0005, dagname=orderedwordcount1 16/08/27 00:33:33 INFO impl. Timelineclientimpl:timeline Service ADDRESS:HTTP://LOCALHOST:8188/WS/V1/TIMELINE/16/08/27 00:33:33 INFO client. Rmproxy:connecting to ResourceManager at hadoop/192.168.0.3:8032 16/08/27 00:33:33 INFO examples.
testorderedwordcount:submitted DAG to Tez Session, dagindex=1 omit hundreds of lines .... 16/08/27 00:33:37 INFO Examples. Testorderedwordcount:dag 1 completed. finalstate=succeeded 16/08/27 00:33:37 INFO examples. Testorderedwordcount:shutting down session 16/08/27 00:33:37 INFO client. Tezclient:shutting down Tez Session, Sessionname=orderedwordcountsession, applicationid=application_1472222203999_
 0005
Test if Tez can run, and then observe Tez's health on Yarn's UI,

Http://hadoop:8088/cluster

Test hive When you are sure

Here is the optional configuration: Add the following in Hive-site.xml:

	<property>
		<name>hive.execution.engine</name>
		<value>tez</value>
	</property>
Or add a ~/.HIVERC in the

Set Hive.execution.engine=tez;

or start hive directly. Set the above set command on the command line

Then execute the Hive query

Like what:

Hive (default) > Set hive.execution.engine;
Hive.execution.engine=tez hive (default) > select Data1,data2 from Test1 order by data1; Query ID = hadoop_20160827004201_cb9e3165-4fd9-4b91-a68e-0ca4155be511 Total jobs = 1 Launching Job 1 out of 1 status:ru Nning (executing on YARN cluster with APP ID application_1472222203999_0006)------------------------------------------ --------------------------------------VERTICES STATUS Total completed RUNNING PENDING FAILED killed--          ------------------------------------------------------------------------------Map 1 Succeeded 0   0 0 0 0 0 Reducer 2 ... Succeeded 1 1 0 0 0 0------------------------------------------------------------ --------------------vertices:02/02 [==========================>>] 100% ELAPSED time:3.66 s--------------- -----------------------------------------------------------------OK data1 DAta2 Time taken:6.346 seconds hive (default) > 
 
The above display indicates that the configuration was successful and that you can see detailed dag information on the UI


Click on the Applicationmaster link to the Tez UI as shown below:

Select the corresponding DAG name link to see the details below:

You can also view it in hadoop:8008/tez-ui2/.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.