APNs Open source Package memory leak problem

Source: Internet
Author: User

APNS (full name: Apple push Notification Service), mainly used to push push message notifications to Apple devices!


Basic Flow:



The question to be talked about today is focused on the 4th link, and our own server pushes notifications to Apple's message center.


Present situation:

For historical reasons, push code is scattered across applications, and as new message channels continue to be plugged in, development and maintenance costs are high, and the push center is being built

Package Dubbo interface to provide services externally, shielding all kinds of differences, will be all the push service to the push center gradually.

The process is long, beginning to access the personal business, the daily call volume is not large, the server also behaved normally;

At the end of August, BI's push management background started docking and released on-line, because BI is for a variety of marketing activities in bulk push, a task less tens of thousands of, more tens of millions,

At this point the server begins to expose some problems.


Here's how to start the optimization process:


1) First line problem exposure, phenomenon: SMS Alarm, view Dubbo Registry No provider service, login online machine, load soared to the 40,ps see the JVM process in, but Dubbo log, service registration failed.

View GC, triggering full gc,old space without releasing


Workaround: Restart the cluster. Before the strategy is sent over the data will be encapsulated in a thread task, by Threadpoolexecutor slowly digest, suspect once sent too many tokens (once 400), production faster than consumption speed, resulting in object accumulation. Contact BI's rare earth classmate, adjust the number of tasks to 100, and hibernate 100ms after each call to the interface.


2) also view the parameters of the JVM, modify the startup script, the original heap size from 1G to 2G, the new generation from the original 300M adjusted to 1G

-xms2g-xmx2g-xmn1g-xx:+useparalleloldgc


Understand the machine configuration of BI, 24 core CPU 64G memory, configuration is very high but only one, using Dubbo default random route, 1 to many, worry about load imbalance,

Note: The online Dubbo Registry has observed that not all machines are down at the same time, but a gradual process

To adjust the routing policy, change to polling mode:

For more information, refer to the Dubbo Development manual

<dubbo:reference id= "* * *" interface= "******" loadbalance= "Roundrobin"/>


3) It lasted a little longer, but after 4 hours of mission running, the old area of the system was occupied by more than 70% and began to worry about whether the full GC would normally be recycled.

Because of the USEPARALLELOLDGC parallel recovery method (for high throughput application types), it is not possible for a CMS to set up a space usage scale to proactively trigger recycling.

But we can manually trigger a full GC by dump memory snapshot

Jmap-dump:live,format=b,file=heap.bin <pid>




Unfortunately, the full GC cleans up once the old area still has 68%+, to be sure, a memory leak occurs

Start installing the mat plug-in to analyze memory snapshots, refer to the Mat usage tutorial

It is found that a large number of Sslsocketimpl instance objects cannot be reclaimed, and the entire link takes up the heap 50%+


4) This problem is tricky because we are using an external open source framework;

Can only check the information online first to see if anyone else has encountered similar problems;

Unfortunately, no ready-made answers were found, and fortunately the source code was found on GitHub.

Https://github.com/fernandospr/javapns-jdk16

But there's only code, and there's not too much documentation, but it doesn't matter.

Hands-on, clothed.


Analyze the code and find two points of doubt:

A) notificationthreads will create n threads based on the number of pre-transmitted threads, each thread is responsible for sending messages to a certain number of devices, and the main thread in order to collect the final send result of n threads. Notificationthreads inherits the Threadgroup and passes the object instance to each sub-thread constructor, and to the main thread, wait, after all the child threads have finished executing, wake up all processes through notifyall and continue with the subsequent process.

There seems to be no obvious problem, but Mat's analysis results threadgroup There are a lot of other threads inside, fearing interference. decided to adopt a more reliable and secure way to control it through Countdownlatch.









b) After the completion of the subtasks (whether normal or abnormal end), in the finally inside the next operation, close the socket connection;

In addition, the number of Countdownlatch minus 1



Repackage and upload the Maven repository (note: The Pom profile needs to be modified)

<dependency>  <groupId>com.github.fernandospr</groupId>  <artifactId> Javapns-jdk16</artifactid>  <version>2.3.1-SNAPSHOT</version></dependency>

Run unit tests locally to successfully push messages


c) The online cluster deploys a machine to start the beta test and run a 1200W push task


After 258 YGC, the use of space in older generations remains low, with only 2% +

In addition to observe S0, S1, e Discovery, a YGC after to the swap area basically can meet the storage of survival objects, there will not be a large number of objects promoted to the old area.


The GC situation, YGC 602 times, did not trigger the full GC when the task ran out



In addition performance monitoring shows that the 800多万条 message has been sent (note: The figure is the number of interface calls, each interface call 100 user token), the response time is normal.


Summarize:

A) on-line alarm, no matter how high the load, or CPU utilization 100%, do not panic, first keep a problem machine, all other machines restart, ensure that the external use is not affected

b) To fully analyze the problem from the entire link, and to communicate with colleagues around the discussion, may be a collision inspiration.


Finally, thanks to the wrong knife, the BRICS, the processing process, provides a lot of help!


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

APNs Open source Package memory leak problem

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.