Spark-submit Use and description

Last Update:2015-09-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

One, the order

1. Submit the job to spark standalone as client.

./spark-submit--master spark://hadoop3:7077--deploy-mode client--class org.apache.spark.examples.SparkPi. /lib/spark-examples-1.3.0-hadoop2.3.0.jar

--deploy-mode client, the submitted node will have a main process to run the driver program. If you use--deploy-mode cluster, the driver program runs directly in the worker.

2. Submit the job to spark on yarn in client mode.

./spark-submit--master Yarn--deploy-mode client--class org.apache.spark.examples.SparkPi. /lib/spark-examples-1.3.0-hadoop2.3.0.jar

Second, Spark1.0.0 application Deployment Tool Spark-submit

as the application of spark becomes more widespread, the need for support for multi-Explorer application deployment Tools is becoming increasingly urgent. Spark1.0.0, the problem has been gradually improved. Starting with Spark1.0.0, Spark provides an easy-to-Start Application Deployment tool, Bin/spark-submit, for quick deployment of spark applications on local, Standalone, YARN, Mesos.

1: Instructions for use
go to the $spark_home directory and enter Bin/spark-submit--help to get help with the command.
hadoop@wyy:/app/hadoop/spark100$ bin/spark-submit--help
usage:spark-submit [options] <app jar | python file> [app options]
Options:
--master Master_url spark://host:port, mesos://host:port, yarn, or Local.
--deploy-mode Deploy_mode driver Run, client running on native, cluster running in cluster
--class the class to run for the Class_name application package
--name name application names
--jars jars A comma-separated list of driver local jar packages and executor class paths
--py-files py_files A comma-separated list of. zip,. Egg,. py files placed on the Python application Pythonpath
--files files comma-separated list of file to be placed in each executor working directory
--properties-file file Sets the location of the files for application properties, default is Conf/spark-defaults.conf
--driver-memory MEM driver memory size, default 512M
Java options for--driver-java-options driver
--driver-library-path Driver Library path extra libraries path entries to pass to the driver
--driver-class-path driver classpath, jar packages added with--jars are automatically included in the Classpath
--executor-memory MEM Executor memory size, default 1G

Spark Standalone with cluster deploy mode only:
--driver-cores NUM Driver uses the number of cores, default is 1
--supervise If this parameter is set, driver failure will restart

Spark Standalone and Mesos only:
--total-executor-cores NUM Executor total number of cores used

yarn-only:
--executor-cores NUM The number of cores used per executor, default is 1
--queue queue_name the queue to which yarn is submitted by the application, default
--num-executors Num Starts the number of executor, default is 2
--archives Archives The list of files extracted to the working directory by each executor, separated by commas

For help information on the above spark-submit, there are a few things to emphasize:

With regard to--master--deploy-mode, under normal circumstances, it is not necessary to configure the--deploy-mode, using the following values to configure--master, using a similar--master spark://host:port-- Deploy-mode Cluster will submit the driver to cluster and then the worker to kill.

Master URL	Meaning
Local	Using 1 worker threads to run a spark application locally
LOCAL[K]	Running the spark application locally using a K worker thread
Local	Use all remaining worker threads to run the spark application locally
Spark://host:port	Connect to a spark standalone cluster to run the spark application on that cluster
Mesos://host:port	Connect to the Mesos cluster to run the spark application on the cluster
Yarn-client	Connected to the yarn cluster in client mode, the location of the cluster is defined by the environment variable Hadoop_conf_dir, which driver run on the client.
Yarn-cluster	Connected to the yarn cluster in a cluster manner, the location of the cluster is defined by the environment variable Hadoop_conf_dir, which driver also runs in the cluster.

If you want to use--properties-file, the attributes defined in--properties-file do not have to be defined in spark-sumbit, such as in conf/spark-defaults.conf Define the Spark.master, you can not use the--master. The priority for the Spark attribute is: sparkconf mode > Command line parameter mode > file configuration method, see Spark1.0.0 property configuration.
Unlike previous versions, Spark1.0.0 automatically passes the jar packages in its own jar package and--jars options to the cluster.
Spark uses the following URIs to handle file propagation:

FILE://uses file://and absolute paths, which are provided by the driver HTTP server to provide file services, and each executor pulls files back from the driver.
HDFs:, http:, https:, ftp:executor directly from URL pull back file
Local:executor files that exist locally, do not need to be pulled back, or files that are shared via NFS network.

If you need to see where the configuration options are coming from, you can use the Open--verbose option to generate more detailed run information for your reference.

Reprint: Spark1.0.0 Application Deployment Tool Spark-submit

Spark-submit Use and description

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark-submit Use and description

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Spark-submit Use and description

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support