One, the order
1. Submit the job to spark standalone as client.
./spark-submit--master spark://hadoop3:7077--deploy-mode client--class org.apache.spark.examples.SparkPi. /lib/spark-examples-1.3.0-hadoop2.3.0.jar
--deploy-mode client, the submitted node will have a main process to run the driver program. If you use--deploy-mode cluster, the driver program runs directly in the worker.
2. Submit the job to spark on yarn in client mode.
./spark-submit--master Yarn--deploy-mode client--class org.apache.spark.examples.SparkPi. /lib/spark-examples-1.3.0-hadoop2.3.0.jar
Second, Spark1.0.0 application Deployment Tool Spark-submit
as the application of spark becomes more widespread, the need for support for multi-Explorer application deployment Tools is becoming increasingly urgent. Spark1.0.0, the problem has been gradually improved. Starting with Spark1.0.0, Spark provides an easy-to-Start Application Deployment tool, Bin/spark-submit, for quick deployment of spark applications on local, Standalone, YARN, Mesos.
1: Instructions for use
go to the $spark_home directory and enter Bin/spark-submit--help to get help with the command.
hadoop@wyy:/app/hadoop/spark100$ bin/spark-submit--help
usage:spark-submit [options] <app jar | python file> [app options]
Options:
--master Master_url spark://host:port, mesos://host:port, yarn, or Local.
--deploy-mode Deploy_mode driver Run, client running on native, cluster running in cluster
--class the class to run for the Class_name application package
--name name application names
--jars jars A comma-separated list of driver local jar packages and executor class paths
--py-files py_files A comma-separated list of. zip,. Egg,. py files placed on the Python application Pythonpath
--files files comma-separated list of file to be placed in each executor working directory
--properties-file file Sets the location of the files for application properties, default is Conf/spark-defaults.conf
--driver-memory MEM driver memory size, default 512M
Java options for--driver-java-options driver
--driver-library-path Driver Library path extra libraries path entries to pass to the driver
--driver-class-path driver classpath, jar packages added with--jars are automatically included in the Classpath
--executor-memory MEM Executor memory size, default 1G
Spark Standalone with cluster deploy mode only:
--driver-cores NUM Driver uses the number of cores, default is 1
--supervise If this parameter is set, driver failure will restart
Spark Standalone and Mesos only:
--total-executor-cores NUM Executor total number of cores used
yarn-only:
--executor-cores NUM The number of cores used per executor, default is 1
--queue queue_name the queue to which yarn is submitted by the application, default
--num-executors Num Starts the number of executor, default is 2
--archives Archives The list of files extracted to the working directory by each executor, separated by commas
For help information on the above spark-submit, there are a few things to emphasize:
With regard to--master--deploy-mode, under normal circumstances, it is not necessary to configure the--deploy-mode, using the following values to configure--master, using a similar--master spark://host:port-- Deploy-mode Cluster will submit the driver to cluster and then the worker to kill.
Master URL |
Meaning |
Local |
Using 1 worker threads to run a spark application locally |
LOCAL[K] |
Running the spark application locally using a K worker thread |
Local
|
Use all remaining worker threads to run the spark application locally |
Spark://host:port |
Connect to a spark standalone cluster to run the spark application on that cluster |
Mesos://host:port |
Connect to the Mesos cluster to run the spark application on the cluster |
Yarn-client |
Connected to the yarn cluster in client mode, the location of the cluster is defined by the environment variable Hadoop_conf_dir, which driver run on the client. |
Yarn-cluster |
Connected to the yarn cluster in a cluster manner, the location of the cluster is defined by the environment variable Hadoop_conf_dir, which driver also runs in the cluster. |
If you want to use--properties-file, the attributes defined in--properties-file do not have to be defined in spark-sumbit, such as in conf/spark-defaults.conf Define the Spark.master, you can not use the--master. The priority for the Spark attribute is: sparkconf mode > Command line parameter mode > file configuration method, see Spark1.0.0 property configuration.
Unlike previous versions, Spark1.0.0 automatically passes the jar packages in its own jar package and--jars options to the cluster.
Spark uses the following URIs to handle file propagation:
FILE://uses file://and absolute paths, which are provided by the driver HTTP server to provide file services, and each executor pulls files back from the driver.
HDFs:, http:, https:, ftp:executor directly from URL pull back file
Local:executor files that exist locally, do not need to be pulled back, or files that are shared via NFS network.
If you need to see where the configuration options are coming from, you can use the Open--verbose option to generate more detailed run information for your reference.
Reprint: Spark1.0.0 Application Deployment Tool Spark-submit
Spark-submit Use and description