O & M series: 05, spark on Yarn

Source: Internet
Author: User

Spark 0.6.0 supports this function

Preparation: run the spark-on-yarn binary release package that requires spark. Refer to the compilation configuration: environment variable: spark_yarn_user_env. You can set the environment variable of spark on Yarn in this parameter, which can be omitted. Example: spark_yarn_user_env = "java_home =/jdk64, foo = bar ". // Todo can be configured with spark_jar to set the location of spark jar in HDFS. For example, export spark_jar = HDFS: // some/path. Set the variable startup on each hadoop nodemanager node: Make sure that the directory indicated by hadoop_conf_dir or yarn_conf_dir contains the configuration file of the hadoop cluster. These configuration files are used to connect the ResourceManager of yarn and write data to DFS. This is the spark installation for submitting tasks, in order to use the spark-submit tool. Therefore, it can be configured only on this machine. There are two modes: yarn-cluster: spark Driver runs an application master process started by the yarn cluster, and the client disappears after the application is initialized. In the production environment, yarn-client: the spark Driver runs in the client process, and the application master is only used to apply for resources from yarn. Test use? // Todo does not perform verification like spark standalon and mesos modes, where the Master Address uses the specified master parameter; in yarn mode, the ResourceManager address is obtained from the hadoop configuration file. Therefore, in yarn mode, the master parameter is "yarn-client" or "yarn-cluster ". Enable an application in yarn-cluster mode :. /bin/spark-submit -- class path. to. your. class -- master yarn-cluster [Options] <app jar> [App options] example: spark_jar = HDFS: // hansight/libs/spark-assembly-1.0.2-hadoop2.4.0.2.1.4.0-632.jar \ ./bin/spark-submit --class org.apache.spark.examples.SparkPI \     --master yarn-cluster \     --num-executors 3\     --driver-memory 4g \     --executor-memory 2g \     --executor-cores 1 \     lib/spark-examples*.jar \     10Note: If you start a yarn client, the client starts the Default Application master. Sparkpi runs as a sub-thread in the application master. The client regularly reads the application master to obtain status updates and displays the updates on the console. After your application is run, the client ends. Enable an application in yarn-client mode :. /bin/spark-submit -- master yarn-Client [Options] <app jar> [App options] only changes the parameter value of -- master to yarn-client, all others are the same as yarn-cluster. Add other jar dependencies in yarn-cluster mode. The driver and client run on different machines. Therefore, the sparkcontext. addjar method is not as out-of-the-box as the client is in local mode. To make sparkcontext. addjar available, you need to add these jar files after the startup command parameter -- jars. For example: $ ./bin/spark-submit --class my.main.Class \ --master yarn-cluster \ --jars my-other-jar.jar,my-other-other-jar.jar my-main-jar.jar app_arg1 app_arg2

O & M series: 05, spark on Yarn

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.