"Gandalf" Spark1.3.0 submitting applications Official document highlights

Last Update:2015-04-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

IntroductionBecause of the job needs, will embrace spark, has carried out the relevant knowledge of learning, now plan to read the latest version of Spark1.3 part of the official documents, one is to review, the second is to understand the latest progress, three is for the company team training to do the reserve. Welcome reprint, please specify Source: http://blog.csdn.net/u010967382/article/details/45062381
Original URL: http://spark.apache.org/docs/latest/submitting-applications.html This document focuses on how to write a well-writtenSparkThe application is submitted to the spark cluster to run.
You first need to package all the dependencies of your application into the application jar. Note that spark and Hadoop do not need to be included in the collection jar as a dependency package because Cluster Manager provides these system-level dependencies at run time.
Once the program is packaged, you can submit the application to the spark cluster through the Bin/spark-submit script.

The bin/spark-submit script format is as follows:
./bin/spark-submit \--class <main-class>--master <master-url> \--deploy-mode <deploy-mode> \--conf <key>=<value> \... # Other options<application-jar> \[Application-arguments]
Some of the commonly used options include:

--class: The application's entry class, which contains the main method;
--master: The URL of the master node in the cluster (e.g. spark://23.195.26.187:7077), the parameter long story, followed by a detailed description;
--Deploy-mode: determines where your driver program runs , cluster mode is running on the worker node, and client mode is running outside the cluster. The default is client mode;
--conf: The base attribute configuration of any spark, in key=value format;
Application-jar: The path to the application jar package, The path must be globally visible, either an HDFs path or a file path for all node local ;
Application-arguments: The parameters received by the application.

About-- Deploy-mode Parameters：

client mode : A common deployment strategy is to submit your application from a gateway machine, which is physically close to your worker machine (for example, the master node in the cluster). In this scenario, the client mode is perfect. for the cluster. The input and output of the application goes through the console. In this case, the client mode is especially suitable for applications that contain REPL ("read-evaluate-output" loop, English: Read-eval-print loop, or REPL), such as Spark Shell .
cluster mode : Another classic scenario, cluster mode, if your application is submitted from a machine that is farther from the Workder machine (for example, from your laptop), usually Minimize network latency between driver and executors by cluster mode

About-- Master Parameter：

Master URL	meaning
local	run spark locally with one worker thread (i.e. no parallelism at all).
local[k]	run spark locally with K worker threads (Ideally, set this to the number of cores in your machine).
local[*]	run spark locally with as many worker threads As logical cores on your machine.
spark://host:port	connect to the Given spark standalone cluster master. The port must be whichever one your master is configured to use, and which is 7077 by default.
mesos://host:port	connect to the Given mesos cluster. The port must be whichever one your are configured to use, and which is 5050 by default. Or, for a Mesos cluster using ZooKeeper, Use `mesos://zk://...` .
Yarn-client	connect to A Yarn cluster in client mode. The cluster location would be found based on The hadoop_conf_dir variable.
Yarn-cluster	Connect to a YARN cluster in cluster mode. The cluster location is found based on Hadoop_conf_dir.

Spark-submit ScriptWilldefault read Configuration properties in conf/spark-defaults.conf , we can inconf/spark-defaults.conf file, you do not need to configure the underlying properties in theSpark-submit is repeatedly specified in the script.For example, if the Spark.master property is already set, you can not pass in the Spark.master parameter again when you call the Spark-submit script.

"Gandalf" Spark1.3.0 submitting applications Official document highlights

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Gandalf" Spark1.3.0 submitting applications Official document highlights

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Gandalf" Spark1.3.0 submitting applications Official document highlights

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support