Spark on Mesos Pits and solutions

Source: Internet
Author: User
Tags shuffle

This article is written in the spark1.6.2 version.
Because fine mode has an excessive impact on short-task performance, the coarse mode is used for scheduling.

some of the main issues:

1.6 Version start dynamic allocation not available

For example, such as Spark-shell programs, idle time resources long-term occupancy but can not be released, resulting in low resource utilization.

Multiple executor cannot be started on a single slave

Only one executor can be started for each mesos slave previous application. The problem is that if your slave is <20 cores,100g Ram>, a <20 of cores,10g ram> will use up its resources, resulting in a waste of 90G of RAM.
Specifically, refer to http://blog.csdn.net/lsshlsw/article/details/51820420

The number of CPUs used per executor is not controllable

For example, a application application <5 cores,10g Ram>, if each slave only 4 cores, will result in the emergence of two executor, one is <4 cores,10g Ram>, and the other is <1 Core, 10G ram>.
Because a executor runs too many tasks, it can easily cause oom, long-time GC, and so on in case of low memory.
Specific reference http://blog.csdn.net/lsshlsw/article/details/51820420 Blockmgr not automatically deleted
High disk space consumption

These issues are resolved in 2.0, but 2.0 of the changes are large, involving a large number of changes in the program, so the following improvements and bugfix can be combined to 1.6.2, rebuild a version, problem solving.

Existing solutions: [SPARK-12330] [MESOS] Fix mesos Coarse mode cleanup [Spark-13002][mesos] Send initial request of executors for dyn allocation [Spark-5095][mesos ] Support launching multiple mesos executors in coarse grained mesos mode. [SPARK-12583] [MESOS] Mesos Shuffle Service:don ' t delete shuffle files before application has stopped [Spark-13001][core][mesos] Prevent gettin G offers when reached Max cores

cluster Jiyuan utilization after repair

Modified cluster load condition (ganglia):

Modified cluster load condition (ganglia):

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.