The latest Spark 1.2 version supports spark application for spark on yarn mode to automatically adjust the number of executor based on task, to enable this feature, you need to do the following:One:
In all NodeManager, modify Yarn-site.xml, add Spark_shuffle value for Yarn.nodemanager.aux-services, Set the Yarn.nodemanager.aux-services.spark_shuffle.class value to Org.apache.spark.network.yarn.YarnShuffleService, as follows:
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle,spark_shuffle<alue>
- </property>
- <property>
- <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
- <value>org.apache.spark.network.yarn.YarnShuffleService</value>
- </property>
Two:
Copy the $SPARK _home/lib/spark-1.2.0-yarn-shuffle.jar file to the Hadoop-yarn/lib directory (that is, the Library directory of Yarn)
Three:
Configure $SPARK _home/conf/spark-default.xml, add the following two items
- Spark . dynamicallocation . 1#最小Executor数
- Spark . dynamicallocation . #最大Executor数
Four:
When executing, turn on the auto-adjust executor number switch to Spark-sql yarn client mode as an example:
- Spark - SQL \
- -- Master yarn \
- -- Deploy - mode client \
- -- conf Spark . Shuffle . Service . enabled = true \
- -- conf Spark . dynamicallocation . enabled = true \
- - e "Select COUNT (*) from XX"
The same is true for using Spark-submit:
- Spark - submit \
- -- class syspark. Sqlonspark \
- -- Master Yarn - client \
- -- conf Spark . Shuffle . Service . enabled = true \
- -- conf Spark . dynamicallocation . enabled = true \
- / Data / Jars / Sqlonspark . jar \
- "Select COUNT (*) from XX"
Spark-sql on Yarn Auto-Adjust executor number configuration