Spark-submit Tool Parameter Description __spark-rdd

Source: Internet
Author: User

Spark on Yarn:job Submit important parameter Description

Spark-submit

--master Yarn-cluster # #使用集群调度模式 (general use of this parameter)

--queue XXXX # Job Submit Queue

Number of--num-executors # executor

--executor-cores 4 #设置单个executor能并发执行task数, according to the job settings, recommended value 2-16 (this does not refer to the number of CPUs, the cluster does not limit CPU use)

--driver-memory 2g # Driver memory size, recommended value 2-6g, unfavorable too big

--executor-memory 5g #单个executor的内存大小, the maximum should not exceed 30G according to job requirement and concurrent number

--conf spark.storage.memoryfraction=0.1 #设置内存用来作为cache的比例, 0.1 is 10% (default is 0.6), if memory cache is not used, set to 0

--conf spark.hadoop.fs.hdfs.impl.disable.cache=true #禁止使用内存cache

--conf spark.default.parallelism=400 #控制Spark中的分布式shuffle过程默认使用的task数量, default others:total number of cores on all Executo R nodes or 2, whichever is larger, we recommend assigning 2-3 tasks for each CPU core (CORE)

--conf spark.cores.max=100 (This parameter is useless on the spark on yarn)

--conf spark.hadoop.mapreduce.input.fileinputformat.split.minsize=134217728 #调整split文件大小


Parameter descriptions to be passed in when executing

usage:spark-submit [options] <app jar | Python file> [app options]

Parameter name

Meaning

--master Master_url

Can be spark://host:port, mesos://host:port, yarn, yarn-cluster,yarn-client, Local

--deploy-mode Deploy_mode

Where the driver program runs, the client or the cluster

--class class_name

Main class name, including package name

--name Name

Application Name

--jars Jars

Driver-dependent Third-party jar packs

--py-files Py_files

A comma-separated list of. zip,. Egg,. py files placed on the Python application Pythonpath

--files files

Comma-separated list of files to be placed in each executor working directory

--properties-file file

Sets the file path for the application properties by default conf/spark-defaults.conf

--driver-memory MEM

Driver program uses memory size

--driver-java-options

--driver-library-path

Library path for Driver program

--driver-class-path

Class path for driver programs

--executor-memory MEM

Executor memory size, default 1G

--driver-cores NUM

The number of CPUs used by the driver program is limited to spark alone mode

--supervise

Reboot driver after failure, limited to spark alone mode

--total-executor-cores NUM

Total number of cores used by executor, limited to Spark Alone, Spark on Mesos mode

--executor-cores NUM

The number of kernels used per executor, default 1, limited to spark on yarn mode

--queue queue_name

The yarn queue to which the application is submitted, default is defaulted queue, limited to spark on yarn mode

--num-executors num

Number of executor initiated, defaults to 2, limited to spark on yarn mode

--archives archives

Limited to spark on yarn mode

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.