Spark on Yarn:job Submit important parameter Description
Spark-submit
--master Yarn-cluster # #使用集群调度模式 (general use of this parameter)
--queue XXXX # Job Submit Queue
Number of--num-executors # executor
--executor-cores 4 #设置单个executor能并发执行task数, according to the job settings, recommended value 2-16 (this does not refer to the number of CPUs, the cluster does not limit CPU use)
--driver-memory 2g # Driver memory size, recommended value 2-6g, unfavorable too big
--executor-memory 5g #单个executor的内存大小, the maximum should not exceed 30G according to job requirement and concurrent number
--conf spark.storage.memoryfraction=0.1 #设置内存用来作为cache的比例, 0.1 is 10% (default is 0.6), if memory cache is not used, set to 0
--conf spark.hadoop.fs.hdfs.impl.disable.cache=true #禁止使用内存cache
--conf spark.default.parallelism=400 #控制Spark中的分布式shuffle过程默认使用的task数量, default others:total number of cores on all Executo R nodes or 2, whichever is larger, we recommend assigning 2-3 tasks for each CPU core (CORE)
--conf spark.cores.max=100 (This parameter is useless on the spark on yarn)
--conf spark.hadoop.mapreduce.input.fileinputformat.split.minsize=134217728 #调整split文件大小
Parameter descriptions to be passed in when executing
usage:spark-submit [options] <app jar | Python file> [app options]
Parameter name |
Meaning |
--master Master_url |
Can be spark://host:port, mesos://host:port, yarn, yarn-cluster,yarn-client, Local |
--deploy-mode Deploy_mode |
Where the driver program runs, the client or the cluster |
--class class_name |
Main class name, including package name |
--name Name |
Application Name |
--jars Jars |
Driver-dependent Third-party jar packs |
--py-files Py_files |
A comma-separated list of. zip,. Egg,. py files placed on the Python application Pythonpath |
--files files |
Comma-separated list of files to be placed in each executor working directory |
--properties-file file |
Sets the file path for the application properties by default conf/spark-defaults.conf |
--driver-memory MEM |
Driver program uses memory size |
--driver-java-options |
|
--driver-library-path |
Library path for Driver program |
--driver-class-path |
Class path for driver programs |
--executor-memory MEM |
Executor memory size, default 1G |
--driver-cores NUM |
The number of CPUs used by the driver program is limited to spark alone mode |
--supervise |
Reboot driver after failure, limited to spark alone mode |
--total-executor-cores NUM |
Total number of cores used by executor, limited to Spark Alone, Spark on Mesos mode |
--executor-cores NUM |
The number of kernels used per executor, default 1, limited to spark on yarn mode |
--queue queue_name |
The yarn queue to which the application is submitted, default is defaulted queue, limited to spark on yarn mode |
--num-executors num |
Number of executor initiated, defaults to 2, limited to spark on yarn mode |
--archives archives |
Limited to spark on yarn mode |