Zi Yue : gentleman food without seeking a full, home without seeking Ann, sensitive to things and cautious in words, on the way and is Yan, can be studious also has.
Gentleman eat not too full, live not too comfortable, work diligently, speak cautiously, to the people of high moral learning, and can correct their shortcomings, so that can be called studious.
Recently, to replace the CDH version with the 5.3.0,hive version from 0.12 to 0.13, after the upgrade is complete, the simple test found that the upgrade of the version of the performance is very large. Hive started in 0.13 to support Tez as the execution engine to improve execution speed.
Comparison of Tez and MR:
The diagram shows that the original MR program is a multi-job dag, and each job writes and reads disks, wasting disk IO and network IO. Tez changes the multi-job Dag to a single job DAG task, reducing the operation of intermediate results.
Installation deployment for Tez:
Hadoop version: 2.5.0
Hive Version: 0.13
Tez version: 0.4.1
1.) Download Tez source code, address: http://archive.apache.org/dist/incubator/tez/tez-0.4.1-incubating/
2.) Compile:
1: Environment dependent
A, JDK 1.7 +
B, MAVEN 3.0 +
C, Protocolbuffer 2.5.0
2: Modify the Hadoop version in Pom.xml to the corresponding version number 2.5.0
3: Compile command:mvn clean package-dskiptests=true-dmaven.javadoc.skip=true wait quietly .....
3.) Upload the compiled Tez's tarball to each machine in the cluster and unzip it into the directory that you want to install.
4.) Upload the extracted files from Tez into HDFs
。 Create directory Hadoop Fs-mkdir/apps
。 Upload file Hadoop fs-put {tez_home}/apps/
5.) Create a new tez-site.xml configuration file in the Hadoop profile directory
。 Add Configuration Tez.lib.uris
1 < Property>2 <name>Tez.lib.uris</name>3 <value>${fs.defaultfs}/apps/tez,${fs.defaultfs}/apps/tez/lib</value>4 </ Property>
6.) Modify Mapred-site.xml
7.) Add the Tez jar to the Hadoop_classpath in hadoop-env.sh
1 Export tez_home={tez_home}2 Export Hadoop_classpath=${hadoop_classpath}:${tez_home} /* : ${tez_home}/lib/*
8.) At the end of the deployment of Tez, test program running Tez: Hadoop jar Tez-tests.jar testorderedwordcount <input> <output>
If it works, it means the deployment was successful.
9.) Modify hive's execution engine for Tez
1 <property>2 << Span style= "color: #800000;" >name>hive.execution.engine</name>3 <value>tez</ Value>4 </property>
To this Tez integration CDH 5.3.0 is done. Get started with stability testing and performance optimization.
Tip: Tez is required for each machine to be deployed.
Tez official website: http://tez.apache.org/index.html
Tez consolidates Hadoop CDH 5.3.0 installation Deployment