Spark new optimization shuffle new energy tuning

Source: Internet
Author: User
Tags shuffle

Shuffle Tuning Parameters

New Sparkconf (). Set ("Spark.shuffle.consolidateFiles", "true")

Spark.shuffle.consolidateFiles: Whether to turn on merging shuffle block file, default to false// Set from Mapartitionrdd above to the next stage of the resulttask when the data transfer fast can be aggregated (the specific principle can be seen under the principle of shuffle and not set the difference)
Spark.reducer.maxSizeInFlight:reduce task pull cache, default 48m//settings pull cache large, can pull more data at once to reduce the number of pull data
Spark.shuffle.file.buffer:map task's write disk cache, default 32k//sets the disk buff cache, reducing the number of overflows to disk, thus heightening the performance of the buff
Spark.shuffle.io.maxRetries: The maximum number of retries to pull failed, default 3 times//Here can prevent mirror GC or full GC when the task thread is occupied by Java garbage collection thread, resulting in pull failure, If the number of retries is 3 each time 5 seconds, a total of 15 seconds, FULLGC is 1 minutes words. Time is far from enough. Spark will think that the pull failed error even caused the application to crash.
Spark.shuffle.io.retryWait: Pull failed retry interval, default 5s//ibid.
Spark.shuffle.memoryFraction: The ratio of memory used for the reduce-side aggregation, by default 0.2, overruns to disk//settings or increased reduce memory to reduce the number of memory overflow disk

Spark new optimization shuffle new energy tuning

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.