Content:
1, the traditional spark memory management problem;
2, Spark unified memory management;
3, Outlook;
========== the traditional Spark memory management problem ============
Spark memory is divided into three parts:
Execution:shuffles, Joins, Sort, aggregations, etc., by default, spark.shuffle.memoryfraction default is 0.2;
Storage:persist (Canche), Large Task Result, torrent type broadcast, etc., default occupancy, spark.storage.memoryfraction default is 0.6;
Other:program Object, Metadata, code, default 0.2
There is a memory use security system (Safetyfraction): 0.8, that is, execution and storage can only use the configuration memory 80%, which is only 0.16, 0.48, 0.16
If a single machine is not strong, then the data in the execution will always be spill to disk, then the performance of shuffle will be very slow, the stronger the single machine, the stronger the result, so when the cluster is built, the pursuit of a single machine (memory) of the Acme, rather than the sheer number of machines.
If there are not many large task result, you can increase the execution appropriately.
Similarly, if the storage does not do the cache, it is a very large memory waste.
The traditional model of memory allocation has a high demand for spark talent.
Very good proof:
650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>
650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>
From execution angle say how to allocate memory, allocate memory has a shufflememorymanager, Taskmemorymanager, Executormemorymanager.
A specific task comes over and may be full of executor memory,
==========spark Unified Memory Management ============
unifiedmemorymanager,spark1.6 ensures that at least 300M of space is available
/**
* A[[Memorymanager]] That enforces a soft boundary between execution and storage such that
* Either side can borrow memory from the other.
*
* The region shared between execution and storage are a fraction of (the total heap space-300mb)
* Configurable through`spark.memory.fraction` (default 0.75). The position of the boundary
* Within this space was further determined by`spark.memory.storageFraction` (default 0.5).
* This means the size of the storage region is 0.75 * 0.5 = 0.375 of the heap space by default.
*
* Storage can borrow as much execution memory as is free until execution reclaims its space.
* When this happens, cached blocks'll be is evicted from memory until sufficient borrowed
* Memory is released to satisfy the execution memory request.
*
* Similarly, execution can borrow as much storage memory as is free. However, execution
* Memory is *never* evicted by storage due to the complexities involved in implementing this.
* The implication is, attempts to cache blocks could fail if execution has already eaten
* Up to most of the storage space, in which case the new blocks'll be evicted immediately
* According to their respective storage levels.
*
* @param storageregionsizeSize of the storage region, in bytes.
* This was not statically reserved; Execution can borrow from
* it if necessary. Cached blocks can evicted only if actual
* Storage Memory usage exceeds this region.
*/
ObjectUnifiedmemorymanager {
//Set aside a fixed amount of memory for non-storage, non-execution purposes.
//This serves a function similar to ' spark.memory.fraction ', but guarantees that we reserve
//sufficient memory for the system even for small heaps. e.g. if we have a 1GB JVM and then
//The memory used for execution and storage'll be is (1024-300) * 0.75 = 543MB by default.
Private Valreserved_system_memory_bytes= -*1024x768*1024x768
Unified memory management, when the executor memory is not enough, will borrow to storage, how much to borrow
When Strorage memory is not enough, it will not let executor release.
Liaoliang Teacher's card:
China Spark first person
Sina Weibo: Http://weibo.com/ilovepains
Public Number: Dt_spark
Blog: http://blog.sina.com.cn/ilovepains
Mobile: 18610086859
qq:1740415547
Email: [Email protected]
This article from "a Flower proud Cold" blog, declined reprint!
Liaoliang on Spark performance optimization tenth quarter of the world exclusive Spark unified memory management!