"Gandalf" Spark1.3.0 submitting applications Official document highlights

Source: Internet
Author: User

IntroductionBecause of the job needs, will embrace spark, has carried out the relevant knowledge of learning, now plan to read the latest version of Spark1.3 part of the official documents, one is to review, the second is to understand the latest progress, three is for the company team training to do the reserve. Welcome reprint, please specify Source: http://blog.csdn.net/u010967382/article/details/45062381
Original URL: http://spark.apache.org/docs/latest/submitting-applications.html This document focuses on how to write a well-writtenSparkThe application is submitted to the spark cluster to run.
You first need to package all the dependencies of your application into the application jar. Note that spark and Hadoop do not need to be included in the collection jar as a dependency package because Cluster Manager provides these system-level dependencies at run time.
Once the program is packaged, you can submit the application to the spark cluster through the Bin/spark-submit script.

The bin/spark-submit script format is as follows:
./bin/spark-submit \--class <main-class>--master <master-url> \--deploy-mode <deploy-mode> \--conf <key>=<value> \... # Other options<application-jar> \[Application-arguments]
Some of the commonly used options include:
  • --class: The application's entry class, which contains the main method;
  • --master: The URL of the master node in the cluster (e.g. spark://23.195.26.187:7077), the parameter long story, followed by a detailed description;
  • --Deploy-mode: determines where your driver program runs , cluster mode is running on the worker node, and client mode is running outside the cluster. The default is client mode;
  • --conf: The base attribute configuration of any spark, in key=value format;
  • Application-jar: The path to the application jar package, The path must be globally visible, either an HDFs path or a file path for all node local ;
  • Application-arguments: The parameters received by the application.

About-- Deploy-mode Parameters
    • client mode : A common deployment strategy is to submit your application from a gateway machine, which is physically close to your worker machine (for example, the master node in the cluster). In this scenario, the client mode is perfect. for the cluster. The input and output of the application goes through the console. In this case, the client mode is especially suitable for applications that contain REPL ("read-evaluate-output" loop, English: Read-eval-print loop, or REPL), such as Spark Shell .
    • cluster mode : Another classic scenario, cluster mode, if your application is submitted from a machine that is farther from the Workder machine (for example, from your laptop), usually Minimize network latency between driver and executors by cluster mode

About-- Master Parameter
Master URL meaning
local run spark locally with one worker thread   (i.e. no parallelism at all).
local[k] run spark locally with K worker threads   (Ideally, set this to the number of cores in your machine).
local[*] run spark locally with as many worker threads As logical cores on your machine.
spark://host:port connect to the Given spark standalone cluster master. The port must be whichever one your master is configured to use, and which is 7077 by default.
mesos://host:port connect to the Given mesos cluster. The port must be whichever one your are configured to use, and which is 5050 by default. Or, for a Mesos cluster using ZooKeeper, Use mesos://zk://... .
Yarn-client connect to A  Yarn cluster  in client mode. 
The cluster location would be found based on The hadoop_conf_dir   variable.
Yarn-cluster Connect to a YARN cluster in cluster mode.
The cluster location is found based on Hadoop_conf_dir.

Spark-submit ScriptWilldefault read Configuration properties in conf/spark-defaults.conf , we can inconf/spark-defaults.conf file, you do not need to configure the underlying properties in theSpark-submit is repeatedly specified in the script.For example, if the Spark.master property is already set, you can not pass in the Spark.master parameter again when you call the Spark-submit script.

"Gandalf" Spark1.3.0 submitting applications Official document highlights

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.