The data collection service averaged 1 hours in oom (Java.lang.OutOfMemoryError:GC overhead limit exceeded) and was found to be oom when the JSON Atom feed was downloaded for processing. The suspicion is that processing the feed memory spikes consumes too much to cause frequent full GC. Such as:
Analysis process
The service downloads 36 data files per 15 minutes from the feed server, including 12 17m,12 18M and 12 + 100 m files. The data format is JSON. Since the service loads the entire JSON file at once, it is then converted into a Java object. The memory consumption of this place may be larger, and some of these can be found through the following set of tests:
Test Preparation:
- 1 16M JSON data files and a 100M JSON data file.
- Jackson2.3.4.final (JSON Parsing library)
- Jdk1.6.0u30
Test method:
- Parsing JSON files through the document Model API, statistics processing time and new memory size
- Parse JSON file with streaming API, statistics processing time and new memory size
Test results:
- Document Model
- Streaming API
- Mem Usage Chart for Document Model (17M JSON file)-The Minor GC take 3.024s, the full GC take 5.244s
- Mem Usage Chart for streaming API (17M JSON file)-394 Minor GC take 78MS, Max GC take 557MS
Conclusion:
- Download is a concurrent download by 5 threads, assuming that the files are around 100M, then at the same time the peak memory may reach 330mx5, about 1.5G. Has basically taken up 1/2 of the total process allocated.
- Each oom occurs at about 14-20 points, this time is the data collection service processing data, a single process needs to process the number of devices is 180,000, if you start to download the feed at this time, will certainly appear oom.
Fixes and improvements
Carefully review the feed download and parsing process, found that the use of a full file load, based on the data in the previous table can be known that this method will occupy memory for a long time, and the source file half of the field is not required later. So the decision to adopt the new scheme is as follows:
- Reduce the concurrent download feed thread, since the previous 5 changed to 2. Because the feed download and preprocessing is not a bottleneck, there is no need to open too many threads to handle the feed, resulting in a sharp increase in memory processing.
- Take a streaming approach to JSON, pre-discard data that is not needed later, and then save the remaining data to the cache for post-processing use.
The memory consumption after the change is as follows:
Compared to the previous memory analysis diagram (the first image of the article), we can see that the total amount of the improvement has decreased, and the memory can be recovered quickly.
Java Performance optimizations: Correct parsing of JSON files