Java Performance optimizations: Correct parsing of JSON files

Source: Internet
Author: User
Tags gc overhead limit exceeded

The data collection service averaged 1 hours in oom (Java.lang.OutOfMemoryError:GC overhead limit exceeded) and was found to be oom when the JSON Atom feed was downloaded for processing. The suspicion is that processing the feed memory spikes consumes too much to cause frequent full GC. Such as:

Analysis process

The service downloads 36 data files per 15 minutes from the feed server, including 12 17m,12 18M and 12 + 100 m files. The data format is JSON. Since the service loads the entire JSON file at once, it is then converted into a Java object. The memory consumption of this place may be larger, and some of these can be found through the following set of tests:

Test Preparation:
    • 1 16M JSON data files and a 100M JSON data file.
    • Jackson2.3.4.final (JSON Parsing library)
    • Jdk1.6.0u30
Test method:
    • Parsing JSON files through the document Model API, statistics processing time and new memory size
    • Parse JSON file with streaming API, statistics processing time and new memory size
Test results:
    • Document Model
    • Streaming API
    • Mem Usage Chart for Document Model (17M JSON file)-The Minor GC take 3.024s, the full GC take 5.244s
    • Mem Usage Chart for streaming API (17M JSON file)-394 Minor GC take 78MS, Max GC take 557MS
Conclusion:
    • Download is a concurrent download by 5 threads, assuming that the files are around 100M, then at the same time the peak memory may reach 330mx5, about 1.5G. Has basically taken up 1/2 of the total process allocated.
    • Each oom occurs at about 14-20 points, this time is the data collection service processing data, a single process needs to process the number of devices is 180,000, if you start to download the feed at this time, will certainly appear oom.
Fixes and improvements

Carefully review the feed download and parsing process, found that the use of a full file load, based on the data in the previous table can be known that this method will occupy memory for a long time, and the source file half of the field is not required later. So the decision to adopt the new scheme is as follows:

    • Reduce the concurrent download feed thread, since the previous 5 changed to 2. Because the feed download and preprocessing is not a bottleneck, there is no need to open too many threads to handle the feed, resulting in a sharp increase in memory processing.
    • Take a streaming approach to JSON, pre-discard data that is not needed later, and then save the remaining data to the cache for post-processing use.

The memory consumption after the change is as follows:

Compared to the previous memory analysis diagram (the first image of the article), we can see that the total amount of the improvement has decreased, and the memory can be recovered quickly.

Java Performance optimizations: Correct parsing of JSON files

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.