Liaoliang on Spark performance optimization tenth quarter of the world exclusive Spark unified memory management!

Source: Internet
Author: User
Tags shuffle

Content:

1, the traditional spark memory management problem;

2, Spark unified memory management;

3, Outlook;

========== the traditional Spark memory management problem ============

Spark memory is divided into three parts:

Execution:shuffles, Joins, Sort, aggregations, etc., by default, spark.shuffle.memoryfraction default is 0.2;

Storage:persist (Canche), Large Task Result, torrent type broadcast, etc., default occupancy, spark.storage.memoryfraction default is 0.6;

Other:program Object, Metadata, code, default 0.2

There is a memory use security system (Safetyfraction): 0.8, that is, execution and storage can only use the configuration memory 80%, which is only 0.16, 0.48, 0.16

If a single machine is not strong, then the data in the execution will always be spill to disk, then the performance of shuffle will be very slow, the stronger the single machine, the stronger the result, so when the cluster is built, the pursuit of a single machine (memory) of the Acme, rather than the sheer number of machines.

If there are not many large task result, you can increase the execution appropriately.

Similarly, if the storage does not do the cache, it is a very large memory waste.

The traditional model of memory allocation has a high demand for spark talent.

Very good proof:

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>

From execution angle say how to allocate memory, allocate memory has a shufflememorymanager, Taskmemorymanager, Executormemorymanager.

A specific task comes over and may be full of executor memory,

==========spark Unified Memory Management ============

unifiedmemorymanager,spark1.6 ensures that at least 300M of space is available

/**
* A[[Memorymanager]] That enforces a soft boundary between execution and storage such that
* Either side can borrow memory from the other.
 *
* The region shared between execution and storage are a fraction of (the total heap space-300mb)
* Configurable through`spark.memory.fraction` (default 0.75). The position of the boundary
* Within this space was further determined by`spark.memory.storageFraction` (default 0.5).
* This means the size of the storage region is 0.75 * 0.5 = 0.375 of the heap space by default.
 *
* Storage can borrow as much execution memory as is free until execution reclaims its space.
* When this happens, cached blocks'll be is evicted from memory until sufficient borrowed
* Memory is released to satisfy the execution memory request.
*
* Similarly, execution can borrow as much storage memory as is free. However, execution
* Memory is *never* evicted by storage due to the complexities involved in implementing this.
* The implication is, attempts to cache blocks could fail if execution has already eaten
* Up to most of the storage space, in which case the new blocks'll be evicted immediately
* According to their respective storage levels.
 *
 * @param storageregionsizeSize of the storage region, in bytes.
* This was not statically reserved; Execution can borrow from
* it if necessary. Cached blocks can evicted only if actual
* Storage Memory usage exceeds this region.
 */

ObjectUnifiedmemorymanager {

//Set aside a fixed amount of memory for non-storage, non-execution purposes.
//This serves a function similar to ' spark.memory.fraction ', but guarantees that we reserve
//sufficient memory for the system even for small heaps. e.g. if we have a 1GB JVM and then
//The memory used for execution and storage'll be is (1024-300) * 0.75 = 543MB by default.
  Private Valreserved_system_memory_bytes= -*1024x768*1024x768

Unified memory management, when the executor memory is not enough, will borrow to storage, how much to borrow

When Strorage memory is not enough, it will not let executor release.

Liaoliang Teacher's card:

China Spark first person

Sina Weibo: Http://weibo.com/ilovepains

Public Number: Dt_spark

Blog: http://blog.sina.com.cn/ilovepains

Mobile: 18610086859

qq:1740415547

Email: [Email protected]


This article from "a Flower proud Cold" blog, declined reprint!

Liaoliang on Spark performance optimization tenth quarter of the world exclusive Spark unified memory management!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.