Spark 0.6.0 supports this function
Preparation: run the spark-on-yarn binary release package that requires spark. Refer to the compilation configuration: environment variable: spark_yarn_user_env. You can set the environment variable of spark on Yarn in this parameter, which can be omitted. Example: spark_yarn_user_env = "java_home =/jdk64, foo = bar ". // Todo can be configured with spark_jar to set the location of spark jar in HDFS. For example, export spark_jar = HDFS: // some/path. Set the variable startup on each hadoop nodemanager node: Make sure that the directory indicated by hadoop_conf_dir or yarn_conf_dir contains the configuration file of the hadoop cluster. These configuration files are used to connect the ResourceManager of yarn and write data to DFS. This is the spark installation for submitting tasks, in order to use the spark-submit tool. Therefore, it can be configured only on this machine. There are two modes: yarn-cluster: spark Driver runs an application master process started by the yarn cluster, and the client disappears after the application is initialized. In the production environment, yarn-client: the spark Driver runs in the client process, and the application master is only used to apply for resources from yarn. Test use? // Todo does not perform verification like spark standalon and mesos modes, where the Master Address uses the specified master parameter; in yarn mode, the ResourceManager address is obtained from the hadoop configuration file. Therefore, in yarn mode, the master parameter is "yarn-client" or "yarn-cluster ". Enable an application in yarn-cluster mode :. /bin/spark-submit -- class path. to. your. class -- master yarn-cluster [Options] <app jar> [App options] example: spark_jar = HDFS: // hansight/libs/spark-assembly-1.0.2-hadoop2.4.0.2.1.4.0-632.jar \
./bin/spark-submit --class org.apache.spark.examples.SparkPI \
--master yarn-cluster \
--num-executors 3
\
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
lib/spark-examples*.jar \
10
Note: If you start a yarn client, the client starts the Default Application master. Sparkpi runs as a sub-thread in the application master. The client regularly reads the application master to obtain status updates and displays the updates on the console. After your application is run, the client ends. Enable an application in yarn-client mode :. /bin/spark-submit -- master yarn-Client [Options] <app jar> [App options] only changes the parameter value of -- master to yarn-client, all others are the same as yarn-cluster. Add other jar dependencies in yarn-cluster mode. The driver and client run on different machines. Therefore, the sparkcontext. addjar method is not as out-of-the-box as the client is in local mode. To make sparkcontext. addjar available, you need to add these jar files after the startup command parameter -- jars. For example:
$ ./bin/spark-submit --class my.main.Class \
--master yarn-cluster \
--jars my-other-jar.jar,my-other-other-jar.jar
my-main-jar.jar
app_arg1 app_arg2
O & M series: 05, spark on Yarn