To more efficiently run dependent jobs (such as the mapreduce jobs generated by pig and hive), reduce disk and network Io,hortonworks developed the DAG Computing Framework Tez.
Tez is a general-purpose DAG Computing framework evolved from the MapReduce computing framework and can be used as the underlying data processing engine for systems such as mapreducer/pig/hive, which is inherently integrated into the resource management platform yarn in Hadoop 2.0, and is created by Hadoop 2.0 core personnel to create, will become a computing framework of the rising star
Required parts of the Library and toolkit gcc make gcc-c++ OpenSSL which has two phantomjs-2.1.1-linux-x86_64 and Nodejs installations will waste a little time
Download Tez source code to extract the compilation after the official website
Note Change the version of Hadoop in POM or set your own Hadoop versions in MVN
MVN package-dhadoop.version=2.7.2-dskiptests-dmaven.javadoc.skip=true
1. Using the tez-0.8.4-minimal.tar.gz in tez-dist/target/, unzip the/opt/single/tez locally,
Build conf under $tez_home, create Tez-site.xml
<?xml version= "1.0" encoding= "UTF-8"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>
<configuration>
<property>
<name>tez.lib.uris</name>
<value> hdfs://hadoop:9000/apps/tez-0.8.4/tez-0.8.4-minimal.tar.gz</value>
</property>
< property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
</configuration>
2. Setting environment variables for Linux
Export Tez_home=/opt/single/tez
export tez_conf_dir= $TEZ _home/conf
export tez_jars= $TEZ _home
3. Add the following in hadoop-env.sh:
Export hadoop_classpath= $HADOOP _classpath: $TEZ _conf_dir: $TEZ _jars/*: $TEZ _jars/lib/*
Mapred-size.xml settings
<property>
<name>mapreduce.framework.name</name>
<value>yarn-tez</value>
</property>
4. Launch Hadoop to upload the compiled tez-0.8.4-minimal.tar.gz to the hdfs://hadoop:9000/apps/tez-0.8.4/directory
5. The settings for the Tez UI are as follows:
Add in Yarn-site.xml:
<property>
<name>yarn.timeline-service.enabled</name>
<value>true</value >
</property>
<property>
<name>yarn.timeline-service.hostname</name>
<value>hadoop</value>
</property>
<property>
<name> yarn.timeline-service.http-cross-origin.enabled</name>
<value>true</value>
</ property>
<property>
<name>yarn.resourcemanager.system-metrics-publisher.enabled</ name>
<value>true</value>
</property>
<property>
<name> Yarn.timeline-service.webapp.address</name>
<value>${yarn.timeline-service.hostname}:8188< /value>
</property>
<property>
<name> Yarn.timeline-service.webapp.https.address</name>
<value>${yarn.timeline-service.hostname} :2191</value>
</property>
Add in Tez-site.xml:
<property>
<description>enable Tez to use the Timeline Server for history logging</description>< C2/><name>tez.history.logging.service.class</name>
<value> org.apache.tez.dag.history.logging.ats.atshistoryloggingservice</value>
</property>
< Property>
<!--The port number of the Tomcat you installed--
<name>tez.tez-ui.history-url.base</name>
<value>http://hadoop:8008/tez-ui/</value>
</property>
<property>
< Name>tez.runtime.convert.user-payload.to.history-text</name>
<value>true</value>
</property>
<property>
<name>tez.task.generate.counters.per.io</name>
<value>true</value>
</property>
6.tomcat configuration:
Install Tomcat here omitted, many online
Then extract the Tez-ui-0.8.4.war,tez-ui2-0.8.4.war into Tomcat's webapps/directory
Mkdir-pv/opt/modules/tomcat-7.0.69/webapps/tez-ui /opt/modules/tomcat-7.0.69/webapps/tez-ui2
cp/opt/ Single/tez/tez-ui-0.8.4.war/opt/modules/tomcat-7.0.69/webapps/tez-ui
Cp/opt/single/tez/tez-ui2-0.8.4.war/ Opt/modules/tomcat-7.0.69/webapps/tez-ui2
jar xvf Tez-ui-0.8.4.war
jar XVF Tez-ui2-0.8.4.war
Configuring the Webapps/tez-ui/scripts/config.js File
Timelinebaseurl: ' http://hadoop:8188 ',
rmweburl: ' http://hadoop:8088 ',
Tomcat set Port: 8008
/opt/modules/tomcat-7.0.69/conf/server.xml
<connector port= "8008" protocol= "http/1.1
" connectiontimeout= "20000"
redirectport= "8443"/>
7. Test:
Start
start-dfs.sh start-yarn.sh yarn-daemon.sh start timelineserver startup.sh Hadoop jar/opt/single/tez/ Tez-tests-0.8.4.jar testorderedwordcount/data/data1/output2 16/08/27 00:33:27 INFO Shim. Hadoopshimsloader:trying to locate Hadoopshimprovider for hadoopversion=2.7.2, majorversion=2, minorVersion=7 16/08/27 00:33:27 INFO Shim. hadoopshimsloader:picked Hadoopshim Org.apache.tez.hadoop.shim.HadoopShim26, providername= Org.apache.tez.hadoop.shim.HadoopShim25_26_27Provider, Overrideproviderviaconfig=null, hadoopversion=2.7.2, majorversion=2, minorversion=7 16/08/27 00:33:28 INFO client. Tezclientutils:permissions on staging directory hdfs://hadoop:9000/tmp/hadoop/tez/staging/1472229207999 is Incorrect:rwxr-xr-x. Fixing permissions to correct value rwx------16/08/27 00:33:28 INFO examples. Testorderedwordcount:creating Tez Session 16/08/27 00:33:28 INFO client. Tezclient:tez Client Version: [Component=tez-api, version=0.8.4, revision=${buildnumber}, scm-url=scm:git:https:// Git-wip-us.aPache.org/repos/asf/tez.git, buildtime=2016-08-25t08:17:01z] 16/08/27 00:33:28 INFO impl. Timelineclientimpl:timeline Service ADDRESS:HTTP://LOCALHOST:8188/WS/V1/TIMELINE/16/08/27 00:33:28 INFO client. Rmproxy:connecting to ResourceManager at hadoop/192.168.0.3:8032 16/08/27 00:33:28 INFO client. Tezclient:using Org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to manage Timeline ACLs 16/08/27 00:33:28 INFO Impl. Timelineclientimpl:timeline Service ADDRESS:HTTP://LOCALHOST:8188/WS/V1/TIMELINE/16/08/27 00:33:28 INFO client. Tezclient:session mode.
Starting session. 16/08/27 00:33:28 INFO Client. Tezclientutils:using Tez.lib.uris value from configuration:hdfs://hadoop:9000/apps/tez-0.8.4/ tez-0.8.4-minimal.tar.gz 16/08/27 00:33:28 INFO client. Tezclientutils:using Tez.lib.uris.classpath value from Configuration:null 16/08/27 00:33:29 INFO client. Tezclient:tez System Stage Directory Hdfs://hadoop:9000/tmp/hadoop/tez/staging/1472229207999/.tez/application_ 1472222203999_0005 doesn ' t exist and is created 16/08/27 00:33:29 INFO ACLs.
atshistoryaclpolicymanager:created Timeline Domain for the history ACLs, domainid=tez_ats_application_1472222203999_0005 16/08/27 00:33:29 INFO Impl. yarnclientimpl:submitted application application_1472222203999_0005 16/08/27 00:33:29 INFO client. Tezclient:the URL to track the Tez session:http://hadoop:8088/proxy/application_1472222203999_0005/16/08/27 00:33:29 I NFO examples. Testorderedwordcount:running Orderedwordcount DAG, Dagindex=1, inputpath=/data/data1, OutputPath=/output2 16/08/27 00:33:29 INFO Examples. Testorderedwordcount:checking DAG specific ACLS 16/08/27 00:33:29 INFO examples. Testorderedwordcount:waiting for tezsession to get into ready state 16/08/27 00:33:32 INFO examples. Testorderedwordcount:submitting DAG to Tez Session, dagindex=1 16/08/27 00:33:32 INFO client. Tezclient:submitting dag to Tezsession, Sessionname=orderedwordcountsession, Applicationid=application_ 1472222203999_0005, DAGNAME=ORDEREDWORDCOunt1, callercontext={Context=tez, Callertype=testorderedwordcount, callerid=application_1472222203999_0005_1} 16/ 08/27 00:33:33 INFO Client. tezclient:submitted dag to Tezsession, Sessionname=orderedwordcountsession, applicationid=application_1472222203999 _0005, dagname=orderedwordcount1 16/08/27 00:33:33 INFO impl. Timelineclientimpl:timeline Service ADDRESS:HTTP://LOCALHOST:8188/WS/V1/TIMELINE/16/08/27 00:33:33 INFO client. Rmproxy:connecting to ResourceManager at hadoop/192.168.0.3:8032 16/08/27 00:33:33 INFO examples.
testorderedwordcount:submitted DAG to Tez Session, dagindex=1 omit hundreds of lines .... 16/08/27 00:33:37 INFO Examples. Testorderedwordcount:dag 1 completed. finalstate=succeeded 16/08/27 00:33:37 INFO examples. Testorderedwordcount:shutting down session 16/08/27 00:33:37 INFO client. Tezclient:shutting down Tez Session, Sessionname=orderedwordcountsession, applicationid=application_1472222203999_
0005
Test if Tez can run, and then observe Tez's health on Yarn's UI,
Http://hadoop:8088/cluster
Test hive When you are sure
Here is the optional configuration: Add the following in Hive-site.xml:
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
Or add a ~/.HIVERC in the
Set Hive.execution.engine=tez;
or start hive directly. Set the above set command on the command line
Then execute the Hive query
Like what:
Hive (default) > Set hive.execution.engine;
Hive.execution.engine=tez hive (default) > select Data1,data2 from Test1 order by data1; Query ID = hadoop_20160827004201_cb9e3165-4fd9-4b91-a68e-0ca4155be511 Total jobs = 1 Launching Job 1 out of 1 status:ru Nning (executing on YARN cluster with APP ID application_1472222203999_0006)------------------------------------------ --------------------------------------VERTICES STATUS Total completed RUNNING PENDING FAILED killed-- ------------------------------------------------------------------------------Map 1 Succeeded 0 0 0 0 0 0 Reducer 2 ... Succeeded 1 1 0 0 0 0------------------------------------------------------------ --------------------vertices:02/02 [==========================>>] 100% ELAPSED time:3.66 s--------------- -----------------------------------------------------------------OK data1 DAta2 Time taken:6.346 seconds hive (default) >
The above display indicates that the configuration was successful and that you can see detailed dag information on the UI
Click on the Applicationmaster link to the Tez UI as shown below:
Select the corresponding DAG name link to see the details below:
You can also view it in hadoop:8008/tez-ui2/.