This article mainly understands the memory allocation in the spark on yarn deployment mode, because there is no in-depth study of the spark source code, so only the log to see the relevant source code, so as to understand "why this, why that." Description
Depending on how the driver is distributed in the Spark application, there are two modes of Spark on yarn: yarn-client mode, Yarn-cluster mode.
When the spark job is run on the yarn, each spark executor is run as a yarn container. Spark allows multiple tasks to run in the same container.
The following figure is the Yarn-cluster mode of the job execution diagram, which comes from the network:
For configuration parameters related to spark on yarn, refer to spark configuration parameters. This article mainly discusses the memory allocation, so only need to focus on the following inner-related parameters: Spark.driver.memory: Default value 512m spark.executor.memory: Default value 512m Spark.yarn.am.memory: Default value 512m spark.yarn.executor.memoryOverhead: value is executormemory * 0.07, with minimum of 384 Spark.yarn.driver.memoryOverhead: Value is drivermemory * 0.07, with minimum of 384 Spark.yarn.am.memoryOverhead: value is am memory * 0.07, with minimum of 384
Note: The--executor-memory/spark.executor.memory controls the size of the executor heap, but the JVM itself takes up a certain amount of heap space, such as an internal String or direct byte buffer, The Spark.yarn.XXX.memoryOverhead property determines the additional heap memory size for each executor or dirver or am requested to yarn, and the default value is Max (384, 0.07 * spark.executor.memory) Executor memory often result in too long GC delay, 64G is a recommended upper limit of executor memory size. HDFS client has a performance problem with a large number of concurrent threads. The approximate estimate is that up to 5 concurrent tasks in each executor can fill up the write bandwidth.
In addition, because the task is committed to run on the yarn, there are several key parameters in yarn, reference yarn memory and CPU configuration: Yarn.app.mapreduce.am.resource.mb:AM can request the maximum memory, the default value is 1536MB Yarn.nodemanager.resource.memory-mb:nodemanager the maximum amount of memory that can be requested, the default is 8192MB YARN.SCHEDULER.MINIMUM-ALLOCATION-MB: A minimum resource that a container can request at the time of dispatch, the default value is 1024MB YARN.SCHEDULER.MAXIMUM-ALLOCATION-MB: The maximum resource that a container can request at the time of dispatch, the default value is 8192MB test
Spark Cluster test environment is: MASTER:64G memory, 16 core CPU worker:128g memory, 32 core CPU worker:128g memory, 32 core CPU worker:128g memory, 32 core CPU worker:128g memory, 32 Core CPU
Note: The yarn cluster is deployed on top of the spark cluster, and a nodemanager is deployed on each of the worker nodes, and the configuration in the yarn cluster is as follows:
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>106496</ value> <!--104G-->
</property>
<property>
<name> yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value> 106496</value>
</property>
<property>
<name> yarn.app.mapreduce.am.resource.mb</name>
<value>2048</value>
</property>
The spark log is basically tuned to debug, and the Log4j.logger.org.apache.hadoop is set to warn build unnecessary output and modify/etc/spark/conf/log4j.properties:
# Set Everything
to is logged to the console log4j.rootcategory=debug, console
log4j.appender.console= Org.apache.log4j.ConsoleAppender
Log4j.appender.console.target=system.err
Log4j.appender.console.layout=org.apache.log4j.patternlayout
Log4j.appender.console.layout.conversionpattern=%d{yy/mm/dd HH:mm:ss}%p%c{1}:%m%n
# Settings to quiet third Party logs that are too verbose
log4j.logger.org.eclipse.jetty=warn
Log4j.logger.org.apache.hadoop=warn
Log4j.logger.org.eclipse.jetty.util.component.abstractlifecycle=error
Log4j.logger.org.apache.spark.repl.sparkimain$exprtyper=info
Log4j.logger.org.apache.spark.repl.sparkiloop$sparkiloopinterpreter=info
Next is to run the test program, with the official SPARKPI example, the following main test client mode, as for cluster mode please refer to the following procedure. Run the following command:
Spark-submit--class org.apache.spark.examples.SparkPi \
--master yarn-client \
--num-executors 4 \
--driver-memory 2g \
--executor-memory 3g \
--executor-cores 4 \
/usr/lib/spark/lib/ Spark-examples-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar \
100000
Observe the output log (extraneous log omitted):
15/06/08 13:57:01 Info sparkcontext:running Spark version 1.3.0 15/06/08 13:57:02 INFO securitymanager:changing view ACL s to:root 15/06/08 13:57:02 info securitymanager:changing Modify ACLS to:root 15/06/08 13:57:03 INFO Memorystore:memo Rystore started with capacity 1060.3 MB 15/06/08 13:57:04 DEBUG yarnclientschedulerbackend:clientarguments called with: --arg bj03-bi-pro-hdpnamenn:51568--num-executors 4--num-executors 4--executor-memory 3g--executor-memory 3g-- Executor-cores 4--executor-cores 4--name Spark Pi 15/06/08 13:57:04 DEBUG yarnclientschedulerbackend: [actor] handled me Ssage (24.52531 ms) reviveoffers from actor[akka://sparkdriver/user/coarsegrainedscheduler#864850679] 15/06/08 13:57:05 Info client:requesting A new application from cluster with 4 nodemanagers 15/06/08 13:57:05 INFO client:verifyi Ng our application has not requested than the maximum memory capability of the cluster (106496 MB/container) 15/0 6/08 13:57:05 INFO Client:will Allocate AM container, with 896 MB memory including 384 MB overhead 15/06/08 13:57:05 INFO client:setting up container launch Contex T for we AM 15/06/08 13:57:07 DEBUG Client: ========================================================================= 15/06/08 13:57:07 Debug Client:yarn AM launch context:15/06/08 13:57:07 debug Client:user class:n/a 15/06/0 8 13:57:07 Debug client:env:15/06/08 13:57:07 debug Client:classpath-> <cps>/__spark__.jar<cps > $HADOOP _conf_dir<cps> $HADOOP _common_home/*<cps> $HADOOP _common_home/lib/*<cps> $HADOOP _ hdfs_home/*<cps> $HADOOP _hdfs_home/lib/*<cps> $HADOOP _mapred_home/*<cps> $HADOOP _mapred_home/ lib/*<cps> $HADOOP _yarn_home/*<cps> $HADOOP _yarn_home/lib/*<cps> $HADOOP _mapred_home/share/ Hadoop/mapreduce/*<cps> $HADOOP _mapred_home/share/hadoop/mapreduce/lib/*<cps>:/usr/lib/spark/lib/ spark-assembly.jar::/usr/lib/hadoop/lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/lib/*
:/usr/lib/hadoop-yarn/*:/usr/lib/hive/lib/*:/usr/lib/flume-ng/lib/*:/usr/lib/paquet/lib/*:/usr/lib/avro/lib/* 15/06/08 13:57:07 DEBUG client:spark_dist_classpath->:/usr/lib/spark/lib/spark-assembly.jar::/usr/lib/hadoop /lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/lib/*:/ usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hive/lib/*:/usr/lib/ flume-ng/lib/*:/usr/lib/paquet/lib/*:/usr/lib/avro/lib/* 15/06/08 13:57:07 DEBUG Client:spark_yarn_cache_files_fi le_sizes-> 97237208 15/06/08 13:57:07 DEBUG client:spark_yarn_staging_dir->. sparkstaging/application_143 3742899916_0001 15/06/08 13:57:07 DEBUG client:spark_yarn_cache_files_visibilities-> PRIVATE 15/06/08 13:57:0 7 Debug Client:spark_user-> root 15/06/08 13:57:07 Debug Client: Spark_yarn_mode-> true 15/06/08 13:57:07 DEBUG client:spark_yarn_cache_files_time_stamps-> 143374 3027399 15/06/08 13:57:07 DEBUG client:spark_yarn_cache_files-> hdfs://mycluster:8020/user/root/.sparkstaging /application_1433742899916_0001/spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar#__spark__.jar 15/06/08 13:57:07 Debug client:resources:15/06/08 13:57:07 debug Client: __spark__.jar-> Resource {scheme: "HDFs "Host:" Mycluster "port:8020 file:"/user/root/.sparkstaging/application_1433742899916_0001/ Spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar "} size:97237208 timestamp:1433743027399 Type:file Visibility:private 15/06/08 13:57:07 Debug client:command:15/06/08 13:57:07 debug Client:/bin/java-server -xmx512m-djava.io.tmpdir=/tmp '-dspark.eventlog.enabled=true '-dspark.executor.instances=4 '- dspark.executor.memory=3g '-dspark.executor.cores=4 '-dspark.driver.port=51568 '-dspark.serializer=Org.apache.spark.serializer.KryoSerializer '-dspark.driver.appuiaddress=http://bj03-bi-pro-hdpnamenn:4040 '- Dspark.executor.id=<driver> '-dspark.kryo.classestoregister=scala.collection.mutable.bitset,scala. Tuple2,scala. Tuple1,org.apache.spark.mllib.recommendation.rating '-dspark.driver.maxresultsize=8g '-Dspark.jars=file:/usr/ Lib/spark/lib/spark-examples-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar '-dspark.driver.memory=2g '- Dspark.eventlog.dir=hdfs://mycluster:8020/user/spark/applicationhistory '-dspark.app.name=spark Pi '- dspark.fileserver.uri=http://x.x.x.x:49172 '-dspark.tachyonstore.foldername= spark-81ae0186-8325-40f2-867b-65ee7c922357 '-dspark.yarn.app.container.log.dir=<log_dir> Org.apache.spark.deploy.yarn.ExecutorLauncher--arg ' bj03-bi-pro-hdpnamenn:51568 '--executor-memory 3072m-- Executor-cores 4--num-executors 4 1> <log_dir>/stdout 2> <log_dir>/stderr 15/06/08 13:57:07 DEBUG Clie NT: ===============================================================================
From will allocate AM container, the with 896 MB memory including 384 MB overhead log you can see that AM is consuming 896 MB of memory, removing 384 MB of overhead memory, and actually only 512 MB, which is the default value for Spark.yarn.am.memory, and you can see that the yarn cluster has 4 NodeManager, with a maximum of 106496 MB of memory per container.
The Yarn AM launch context initiates a Java process that sets JVM memory to 512m, see/bin/java-server-xmx512m.
Why is the default value here? To see the code for printing the above line of logs, see org.apache.spark.deploy.yarn.Client:
Private def verifyclusterresources (newappresponse:getnewapplicationresponse): unit = {val Maxmem = newappresponse. Getmaximumresourcecapability (). GetMemory () loginfo ("Verifying our application has no requested more than" maximum "+ S" memory capability of the cluster ($maxMem MB per Container) "Val executormem = args.executormemory + exec Utormemoryoverhead if (Executormem > Maxmem) {throw new IllegalArgumentException (s) Required executor (${args.executormemory} "+ S" + $executorMemoryOverhead MB) is above the max threshold ($maxMem MB) of this cluster! ")} val ammem = Args.ammemory + ammemoryoverhead if (Ammem > Maxmem) {throw new Illegalargumentexce Ption (S "Required AM Memory (${args.ammemory}" + S "+ $amMemoryOverhead MB) is above the max threshold ($maxMem MB) o
F this cluster! ")} Loginfo ("'ll allocate AM container, with%d MB memory including%d MB overhead". Format (Ammem, AmmemoryovErhead))}
The
Args.ammemory from the Clientarguments class, in which the output parameters are validated:
Private Def Validateargs (): unit = {if (numexecutors <= 0) {throw new IllegalArgumentException (" Must specify at least 1 executor!\n "+ getusagemessage ())} if (Executorcores < Sparkconf.getint (" Spark.tas
K.cpus ", 1)) {throw new Sparkexception (" Executor cores must not be less than "+" Spark.task.cpus. ")} if (Isclustermode) {for (Key <-Seq (Ammemkey, Ammemoverheadkey, Amcoreskey)) {if (Sparkconf.contai
NS (key) {println (S "$key is set but does not apply in cluster mode.")}} Ammemory = drivermemory Amcores = drivercores} else {for (key <-Seq (Drivermemoverheadkey, Drivercore
SKey)) {if (Sparkconf.contains (key)) {println (s) $key is set but does not apply in client mode. ") } sparkconf.getoption (Ammemkey). Map (Utils.memorystringtomb). foreach {mem => Ammemo ry = mem} sparkconf.getoption (AmcoreSKey). Map (_.toint). foreach {cores => amcores = Cores}}}
From the above code you can see that when Isclustermode is true, the args.ammemory value is drivermemory; otherwise, the default value of 512m is taken from Spark.yarn.am.memory if the property is not set. Isclustermode is true if userclass is not NULL, def Isclustermode:boolean = UserClass!= null, that is, the output parameter requires a--class parameter. You can see from the following log that there is no such parameter in the clientarguments output parameter.
15/06/08 13:57:04 DEBUG yarnclientschedulerbackend:clientarguments called with:--arg bj03-bi-pro-hdpnamenn:51568-- Num-executors 4--num-executors 4--executor-memory 3g--executor-memory 3g--executor-cores 4--executor-cores 4--name S Park Pi
Therefore, to set the memory value of AM request, either use the cluster mode or in client mode, there is a--conf manually set the Spark.yarn.am.memory property, for example:
Spark-submit--class org.apache.spark.examples.SparkPi \
--master yarn-client \
--num-executors 4 \
--driver-memory 2g \
--executor-memory 3g \
--executor-cores 4 \
--conf spark.yarn.am.memory=1024m \
/ Usr/lib/spark/lib/spark-examples-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar \
100000
Open the Yarn management interface and you can see:
A. Spark Pi application has launched 5 container, using 18G memory, 5 CPU core
B. Yarn started a container for AM, taking up memory of 2048M
C. Yarn has launched 4 container run tasks, each container occupies 4096M of memory
Why is 2G +4g *4=18g? The first container only requested 2G of memory because our program only requested 512m memory for AM, and the YARN.SCHEDULER.MINIMUM-ALLOCATION-MB parameter determines the minimum number of 2G memory to request. As for the rest of the container, we set the executor-memory memory to 3G, why each container occupy memory for 4096M.
In order to find the law, test several sets of data, test and collect executor-memory 3G, 4G, 5G, 6G each executor corresponding container memory request: EXECUTOR-MEMORY=3G:2G+4G * 4=18g EXECUTOR-MEMORY=4G:2G+6G * 4=26g executor-memory=5g:2g+6g * 4=26g executor-memory=6g:2g+8g * 4=34G
On this issue, I was looking at the source code, according to Org.apache.spark.deploy.yarn.ApplicationMaster-> yarnrmclient-> Yarnallocator's class lookup path finds such a piece of code in Yarnallocator:
Executor memory in MB.
Protected Val executormemory = args.executormemory
//Additional memory overhead.
Protected Val memoryoverhead:int = Sparkconf.getint ("Spark.yarn.executor.memoryOverhead",
Math.max (memory_ Overhead_factor * executormemory). ToInt, Memory_overhead_min))
//number of cores per executor.
Protected Val executorcores = args.executorcores
//Resource capability requested for each executors
private Val R esource = resource.newinstance (executormemory + memoryoverhead, executorcores)
Because there is no specific to see yarn source code, so here to guess the size of the container is based on Executormemory + Memoryoverhead calculated, The approximate rule is that each container size must be an integer multiple of the YARN.SCHEDULER.MINIMUM-ALLOCATION-MB value, when executor-memory=3g, Executormemory + Memoryoverhead for 3g+384m=3456m, the requested container size is YARN.SCHEDULER.MINIMUM-ALLOCATION-MB * 2 =4096m=4g, and so on.
Note: Yarn always rounds up memory requirement to multiples of YARN.SCHEDULER.MINIMUM-ALLOCATION-MB, which by default is 102 4 or 1GB. Spark adds a overhead to spark_executor_memory/spark_driver_memory before asking for the Yarn.
In addition, we need to pay attention to the calculation of Memoryoverhead, when the value of executormemory is very large, the Memoryoverhead value will be larger, this time is not 384m, the corresponding container the memory value of the application also become larger, For example, when the executormemory is set to 90G, the Memoryoverhead value is Math.max (0.07 * 90G, 384m) =6.3g, and its corresponding container request memory is 98G.
Look back to the container assigned to AM 2G memory reason, 512+384=896, less than 2G, so allocate 2G, you can set the value of spark.yarn.am.memory and then to observe.
Open the Spark management interface http://ip:4040 to see the memory footprint in driver and executor:
From the figure above you can see how the executor occupies 1566.7 MB of memory, and how this is calculated. Reference Spark on Yarn:where Have the Memory Gone? This article, the totalexecutormemory is calculated as follows:
Yarn/common/src/main/scala/org/apache/spark/deploy/yarn/yarnsparkhadooputil.scala
Val MEMORY_OVERHEAD_ FACTOR = 0.07
val memory_overhead_min = 384
//yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ Yarnallocator.scala
protected Val memoryoverhead:int = Sparkconf.getint ("Spark.yarn.executor.memoryOverhead",
Math.max ((Memory_overhead_factor * executormemory). ToInt, memory_overhead_min))
......
Val totalexecutormemory = executormemory + memoryoverhead
numpendingallocate.addandget (missing)
LogInfo (s) Would allocate $missing executor containers, each with $totalExecutorMemory MB "+
S" Memory including $memoryOverhead M B overhead ")
Here we set the 3G memory for Executor-memory, the Memoryoverhead value is Math.max (0.07 * 3072, 384) = 384, and its maximum available memory is computed by the following code:
Core/src/main/scala/org/apache/spark/storage/blockmanager.scala/** return to the total
amount of storage memory Available. *
Private def getmaxmemory (conf:sparkconf): Long = {
val memoryfraction = conf.getdouble (" Spark.storage.memoryFraction ", 0.6)
val safetyfraction = conf.getdouble (" spark.storage.safetyFraction ", 0.9)
(Runtime.getRuntime.maxMemory * memoryfraction * safetyfraction). Tolong
}
That is, for the executor-memory setting of 3G, the executor memory footprint is approximately 3072m * 0.6 * 0.9 = 1658.88m, Note: It is actually multiplied by the value of Runtime.getRuntime.maxMemory, which is less than 3072m.
Driver occupies 1060.3 MB in the figure above, the Driver-memory value is bit 2G, so the storage memory footprint in driver is: 2048m * 0.6 * 0.9 =1105.92m, Note: It is actually multiplied by the value of Runtime.getRuntime.maxMemory, which is less than 2048m.
At this point, view the worker node Coarsegrainedexecutorbackend process startup script:
$ jps 46841 Worker 21894 coarsegrainedexecutorbackend 9345 21816 executorlauncher 43369 24300 nodemanager 38012 Journ Alnode 36929 quorumpeermain 22909 Jps $ ps-ef|grep 21894 Nobody 21894 21892 99 17:28? 00:04:49/usr/java/jdk1.7.0_71/bin/java-server-xx:onoutofmemoryerror=kill%p-xms3072m-xmx3072m-djava.io.tmpdir=/ data/yarn/local/usercache/root/appcache/application_1433742899916_0069/container_1433742899916_0069_01_000003/ tmp-dspark.driver.port=60235-dspark.yarn.app.container.log.dir=/data/yarn/logs/application_1433742899916_0069/ container_1433742899916_0069_01_000003 Org.apache.spark.executor.CoarseGrainedExecutorBackend--driver-url Akka.tcp://sparkdriver@bj03-bi-pro-hdpnamenn:60235/user/coarsegrainedscheduler--executor-id 2--hostname X.X.X.X- -cores 4--app-id application_1433742899916_0069--user-class-path file:/data/yarn/local/usercache/root/appcache/ Application_1433742899916_0069/container_1433742899916_0069_01_000003/__app__.jar
You can see that each coarsegrainedexecutorbackend process allocates 3072m of memory, and if we want to see the JVM running for each executor, we can turn on JMX. Add the following line of code to/etc/spark/conf/spark-defaults.conf:
Spark.executor.extrajavaoptions-dcom.sun.management.jmxremote.port=1099-dcom.sun.management.jmxremote.ssl= False-dcom.sun.management.jmxremote.authenticate=false
The JVM heap memory run is then monitored by jconsole, which facilitates debugging memory size. Summarize
From the above, in client mode, Am corresponds to the container memory by Spark.yarn.am.memory plus spark.yarn.am.memoryOverhead to determine, executor plus the value of Spark.yarn.executor.memoryOverhead to determine Corresponds to the memory size that the container needs to request, Driver and executor memory plus spark.yarn.driver.memoryOverhead or spark.yarn.executor.memoryOverhead values are then multiplied by 0.54 to determine storage Memory memory size. In yarn, the amount of memory requested by the container must be an integral multiple of YARN.SCHEDULER.MINIMUM-ALLOCATION-MB.
The following diagram shows the Spark on YARN memory structure, from How-to:tune Your Apache Spark Jobs (Part 2):