Spark-submit Use and description

Source: Internet
Author: User

One, the order

1. Submit the job to spark standalone as client.

./spark-submit--master spark://hadoop3:7077--deploy-mode client--class org.apache.spark.examples.SparkPi. /lib/spark-examples-1.3.0-hadoop2.3.0.jar

--deploy-mode client, the submitted node will have a main process to run the driver program. If you use--deploy-mode cluster, the driver program runs directly in the worker.

2. Submit the job to spark on yarn in client mode.

./spark-submit--master Yarn--deploy-mode client--class org.apache.spark.examples.SparkPi. /lib/spark-examples-1.3.0-hadoop2.3.0.jar

Second, Spark1.0.0 application Deployment Tool Spark-submit

as the application of spark becomes more widespread, the need for support for multi-Explorer application deployment Tools is becoming increasingly urgent. Spark1.0.0, the problem has been gradually improved. Starting with Spark1.0.0, Spark provides an easy-to-Start Application Deployment tool, Bin/spark-submit, for quick deployment of spark applications on local, Standalone, YARN, Mesos.

1: Instructions for use
go to the $spark_home directory and enter Bin/spark-submit--help to get help with the command.
hadoop@wyy:/app/hadoop/spark100$ bin/spark-submit--help
usage:spark-submit [options] <app jar | python file> [app options]
Options:
--master Master_url spark://host:port, mesos://host:port, yarn, or Local.
--deploy-mode Deploy_mode driver Run, client running on native, cluster running in cluster
--class the class to run for the Class_name application package
--name name application names
--jars jars A comma-separated list of driver local jar packages and executor class paths
--py-files py_files A comma-separated list of. zip,. Egg,. py files placed on the Python application Pythonpath
--files files comma-separated list of file to be placed in each executor working directory
--properties-file file Sets the location of the files for application properties, default is Conf/spark-defaults.conf
--driver-memory MEM driver memory size, default 512M
Java options for--driver-java-options driver
--driver-library-path Driver Library path extra libraries path entries to pass to the driver
--driver-class-path driver classpath, jar packages added with--jars are automatically included in the Classpath
--executor-memory MEM Executor memory size, default 1G

Spark Standalone with cluster deploy mode only:
--driver-cores NUM Driver uses the number of cores, default is 1
--supervise If this parameter is set, driver failure will restart

Spark Standalone and Mesos only:
--total-executor-cores NUM Executor total number of cores used

yarn-only:
--executor-cores NUM The number of cores used per executor, default is 1
--queue queue_name the queue to which yarn is submitted by the application, default
--num-executors Num Starts the number of executor, default is 2
--archives Archives The list of files extracted to the working directory by each executor, separated by commas


For help information on the above spark-submit, there are a few things to emphasize:

    • With regard to--master--deploy-mode, under normal circumstances, it is not necessary to configure the--deploy-mode, using the following values to configure--master, using a similar--master spark://host:port-- Deploy-mode Cluster will submit the driver to cluster and then the worker to kill.

Master URL Meaning
Local Using 1 worker threads to run a spark application locally
LOCAL[K] Running the spark application locally using a K worker thread
Local
Use all remaining worker threads to run the spark application locally
Spark://host:port Connect to a spark standalone cluster to run the spark application on that cluster
Mesos://host:port Connect to the Mesos cluster to run the spark application on the cluster
Yarn-client Connected to the yarn cluster in client mode, the location of the cluster is defined by the environment variable Hadoop_conf_dir, which driver run on the client.
Yarn-cluster Connected to the yarn cluster in a cluster manner, the location of the cluster is defined by the environment variable Hadoop_conf_dir, which driver also runs in the cluster.
    • If you want to use--properties-file, the attributes defined in--properties-file do not have to be defined in spark-sumbit, such as in conf/spark-defaults.conf Define the Spark.master, you can not use the--master. The priority for the Spark attribute is: sparkconf mode > Command line parameter mode > file configuration method, see Spark1.0.0 property configuration.

    • Unlike previous versions, Spark1.0.0 automatically passes the jar packages in its own jar package and--jars options to the cluster.

    • Spark uses the following URIs to handle file propagation:

      • FILE://uses file://and absolute paths, which are provided by the driver HTTP server to provide file services, and each executor pulls files back from the driver.

      • HDFs:, http:, https:, ftp:executor directly from URL pull back file

      • Local:executor files that exist locally, do not need to be pulled back, or files that are shared via NFS network.

    • If you need to see where the configuration options are coming from, you can use the Open--verbose option to generate more detailed run information for your reference.

Reprint: Spark1.0.0 Application Deployment Tool Spark-submit

Spark-submit Use and description

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.