About spark running a streaming calculator for a period of time appears GC overhead limit exceeded

Source: Internet
Author: User
Tags gc overhead limit exceeded

Recently, when upgrading a framework, it was found that the GC overhead limit exceeded error occurred at some point in time for a streaming computation program.


This problem is certainly not enough memory, but the initial set of memory is enough ah, so a variety of memory optimization, such as the definition of the variable in the loop outside the body control, but found that only the interval of time to push back a bit.


Still did not find the crux of the problem.


Later analysis of the next, may be what variables accounted for the memory is not released in time,


There are several Dataframe cache codes, but this cache should have a mechanism to automatically release the cleanup from Spark.


In order to test, manually add unpersist for memory release, and then go online, and found that the problem is gone.


It turns out that this problem is really a memory problem.


Take a closer look at the official note.

Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) FAs Hion. If you would like to manually remove a RDD instead of waiting for it and fall out of the cache, use the rdd.unpersist () Me Thod.


Perhaps this automatic mechanism is a bit too late in streaming calculations, resulting in errors. The pit is still very deep.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.