Official Tezui need tez6+hadoop2.6.0
In fact Tezui tez0.53+hadoop2.4.0+ can also play as long as Hadoop has Timelineserver
However, hadoop2.4.0 hadoop2.5.0 Timelineserver does not support cross-domain requests: So using the Tez view in ambari2.2 to build it can be achieved, and convenient and quick.
Tez-site.xml
<configuration>
<property>
<name>tez.lib.uris</name>
<value>hdfs:///apps/tez-0.5.3/tez-0.5.3.tar.gz</value>
</property>
<property>
<name>tez.task.generate.counters.per.io</name>
<value>true</value>
</property>
<property>
<description>log history using the Timeline server</description>
<name>tez.history.logging.service.class</name>
<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
<property>
<description>publish configuration information to Timeline server.</description>
<name>tez.runtime.convert.user-payload.to.history-text</name>
<value>true</value>
</property>
</configuration>
Yarn-site.xml
Plus
<property>
<description>indicate to clients whether Timeline service was enabled or not.
If enabled, the Timelineclient library used by end-users'll post entities
and events to the Timeline server.</description>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>
<property>
<description>the hostname of the Timeline Service Web application.</description>
<name>yarn.timeline-service.hostname</name>
<value>192.168.117.117</value>
</property>
<property>
<name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
<value>true</value>
</property>
<property>
<description>enables cross-origin Support (CORS) for Web services where
Cross-origin Web Response headers is needed. For example, JavaScript making
A Web services request to the timeline server.</description>
<name>yarn.timeline-service.http-cross-origin.enabled</name>
<value>true</value><!--Hadoop 2.6.0 Support--more information see
Http://search-hadoop.com/m/tQlTsMD%26subj=Tez+nbsp+taskcount+log+visualization
</property>
<property>
<description>address for the Timeline server to start the RPC server.</description>
<name>yarn.timeline-service.address</name>
<value>${yarn.timeline-service.hostname}:10201</value>
</property>
<property>
<description>the http address of the Timeline service Web application.</description>
<name>yarn.timeline-service.webapp.address</name>
<value>${yarn.timeline-service.hostname}:8188</value>
</property>
<property>
<description>the HTTPS address of the Timeline service Web application.</description>
<name>yarn.timeline-service.webapp.https.address</name>
<value>${yarn.timeline-service.hostname}:2191</value>
</property>
Yarn Timelineserver start can be started Timelineserver
Tez Config
https://issues.apache.org/jira/browse/TEZ-2294
Tez optimization
Listing some details at very
-Set "tez.task.generate.counters.per.io=true" to get more details on the task counters. Basically this starts printinng the counters per edge, which can is a lot more useful for debugging.
Want to avoid container launches etc if you analyze for first time, try Hive.prewarm.enabled=true & hi Ve.prewarm.numcontainers=<no of containers you want in your sesssion to be prewarmed>
-Container reuse is enabled by the default in Tez. (Tez.am.container.idle.release-timeout-min.millis, Tez.am.container.idle.release-timeout-max.millis controls the Amount of time a container is held by AM before releasing it)
-Set TEZ.RUNTIME.IO.SORT.MB appropriately to avoid spills (your can check task counters in the logs to find out the spills and adjust it accordingly)
-Set tez.runtime.sort.threads=2 to enable Pipelinedsorter which are a lot performant than Defaultsorter (this is the Defau LT in Master branch. But if your is using earlier releases, you can turn it in by setting tez.runtime.sort.threads=2).
-Set Tez.runtime.compress=true and set Tez.runtime.compress.codec (Snappycodec is preferred, but it's upto you to choose )
-Set Tez.runtime.shuffle.keep-alive.enabled=true in case you have shuffle heavy workload. This reduces number of connections in shuffle.
-Adjust memory allocated to different inputs/outputs based on Tez.task.scale.memory.ratios (it's more of an expert le Vel setting which you might want-touch after nailing off any memory pressure)
-Adjusting shuffle buffers is also possible, but would advise if you nail a issue related to Shuffle/merge Codepath.
-Set "tez.runtime.optimize.local.fetch=true" to bypass HTTP fetches (if data is locally present)
Feel free to refer to Https://github.com/t3rmin4t0r/tez-autobuild/blob/master/tez-site.xml for any commonly used Settings for benchmarks.
Rajesh,
What is the problems with has tez.runtime.shuffle.keep-alive.enabled and Tez.runtime.optimize.local. fetch set to true all by default?
@r7raul1984, would filing a documentation jira for your question. The list that Rajesh provided might is good to formalize into a doc and/or wiki.
Also, please take a look at https://issues.apache.org/jira/browse/TEZ-2294-See all the list of parameters. If you see something off or not clear enough, please add your comments to the Jira.
@Rohini,
We recently changed Tez.runtime.optimize.local.fetch to TRUE as the default value in master. The feature is introduced and probably kept as false initially as it had not been fully battle tested.
The latter I am assuming depends on what many open connections a cluster ' s setup can sustain and needs to being tuned in combi Nation with "Tez.runtime.shuffle.keep-alive.max.connections". Good point on whether we should make this true by default. Would wait for the @Rajesh/@Gopal/@Sid to chime in and they can open a new jira if this is generally beneficial in the most setups.
> What is the problems with have
>tez.runtime.shuffle.keep-alive.enabled and
>tez.runtime.optimize.local.fetch set to True all by default?
> What is the problems with have
>tez.runtime.shuffle.keep-alive.enabled and
>tez.runtime.optimize.local.fetch set to True all by default?
Nothing have failed due to these so far-we
TEZ UI Setup