IntroductionBecause of the job needs, will embrace spark, has carried out the relevant knowledge of learning, now plan to read the latest version of Spark1.3 part of the official documents, one is to review, the second is to understand the latest progress, three is for the company team training to do the reserve. Welcome reprint, please specify Source: http://blog.csdn.net/u010967382/article/details/45062381
Original URL: http://spark.apache.org/docs/latest/submitting-applications.html This document focuses on how to write a well-writtenSparkThe application is submitted to the spark cluster to run.
You first need to package all the dependencies of your application into the application jar. Note that spark and Hadoop do not need to be included in the collection jar as a dependency package because Cluster Manager provides these system-level dependencies at run time.
Once the program is packaged, you can submit the application to the spark cluster through the Bin/spark-submit script.
The bin/spark-submit script format is as follows:
./bin/spark-submit \--class <main-class>--master <master-url> \--deploy-mode <deploy-mode> \--conf <key>=<value> \... # Other options<application-jar> \[Application-arguments]
Some of the commonly used options include:
- --class: The application's entry class, which contains the main method;
- --master: The URL of the master node in the cluster (e.g. spark://23.195.26.187:7077), the parameter long story, followed by a detailed description;
- --Deploy-mode: determines where your driver program runs , cluster mode is running on the worker node, and client mode is running outside the cluster. The default is client mode;
- --conf: The base attribute configuration of any spark, in key=value format;
- Application-jar: The path to the application jar package, The path must be globally visible, either an HDFs path or a file path for all node local ;
- Application-arguments: The parameters received by the application.
About--
Deploy-mode Parameters:
- client mode : A common deployment strategy is to submit your application from a gateway machine, which is physically close to your worker machine (for example, the master node in the cluster). In this scenario, the client mode is perfect. for the cluster. The input and output of the application goes through the console. In this case, the client mode is especially suitable for applications that contain REPL ("read-evaluate-output" loop, English: Read-eval-print loop, or REPL), such as Spark Shell .
- cluster mode : Another classic scenario, cluster mode, if your application is submitted from a machine that is farther from the Workder machine (for example, from your laptop), usually Minimize network latency between driver and executors by cluster mode
About--
Master Parameter:
Master URL |
meaning |
local |
run spark locally with one worker thread (i.e. no parallelism at all). |
local[k] |
run spark locally with K worker threads (Ideally, set this to the number of cores in your machine). |
local[*] |
run spark locally with as many worker threads As logical cores on your machine. |
spark://host:port |
connect to the Given spark standalone cluster master. The port must be whichever one your master is configured to use, and which is 7077 by default. |
mesos://host:port |
connect to the Given mesos cluster. The port must be whichever one your are configured to use, and which is 5050 by default. Or, for a Mesos cluster using ZooKeeper, Use mesos://zk://... . |
Yarn-client |
connect to A Yarn cluster in client mode. The cluster location would be found based on The hadoop_conf_dir variable. |
Yarn-cluster |
Connect to a YARN cluster in cluster mode. The cluster location is found based on Hadoop_conf_dir. |
Spark-submit ScriptWilldefault read Configuration properties in conf/spark-defaults.conf , we can inconf/spark-defaults.conf file, you do not need to configure the underlying properties in theSpark-submit is repeatedly specified in the script.For example, if the Spark.master property is already set, you can not pass in the Spark.master parameter again when you call the Spark-submit script.
"Gandalf" Spark1.3.0 submitting applications Official document highlights