Java_ Memory Leak _ instance 1

Source: Internet
Author: User
Tags ack

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

The discovery process of Java memory leak problem during the time of pressure measurement (2017-08-14)

"Previous article"

①20170811 the session function between a system and B system is measured, plus the chat message during script preparation, the cumulative chat 30w+ message is expected;

②20170814 planned to add a large number of session function to be measured, the situation is as follows;

"Application Performance"

①B System Front Desk open error "504";

② View background application CPU, CPU utilization of up to 700+% (8 cores);

③ View the background memory situation, continuous FULLGC, and a fullgc time of about 9s, from here can roughly locate CPU high reason is the memory GC problem caused;

"View Application JVM Configuration"

① consulted B development team, loader mentioned should not be the JVM configuration caused by the problem;

"Try to analyze"

① attempts to use JVISUALVM for "heap dump", but because there is no memory, the JVISUALVM is stuck (before the test can properly connect and display the JVM condition);

② use Jmap command "jmap-dump:format=b,file=heap.hprof pid" Dump,dump file has 16G (helpless use mat cannot open);

③ try to shutdown after application restart, unable to shutdown, and finally use "kill-9 pid" violence to solve the situation can not shutdown, after restarting the application;

"Post-restart conditions"

① uses JVISUALVM to view heap memory usage as "heap memory continues to rise";

② Restart 1 hours later, dump file analysis, wherein "Java.util.concurrent.linkedblockingqueue$node" occupies up to 1G of memory, the basic can determine the existence of a "memory leak";

"Com.best.oasis.b.common.entity.messagetransship.messagetransship" Object 151MB, and there are 160w messagetransship objects;

③b Developing the Review code: The reason for this: a memory leak exists in the queue of tasks waiting to be executed in the thread pool;

Normal Condition:

A after the application server sends a message to the B server, the B server receives the message and then the message is stored in the intermediate table b_messagetransship, and the message is forwarded to the B customer service side, and the B client receives the message and ack,ack the message after the successful deletion b_ The message in the messagetransship. To prevent messages from being lost, B has a timed re-send job, which is used to push the messages in the B_messagetransship table once again every 5s;

Exception Condition:

The message sent by the 1.A server to the B server exists in the B_messagetransship table (at this point the status is "Pending_send"), because the network/b client is actively exiting and so on, causing B clients not to receive the message from the B server, then the status of the message is set to " Send_failed "exists in table b_messagetransship;

2.A server sent a message to B server, B customer service side correctly received, but B customer sent an ACK request failed to return, then the status of the message is set to "Pending_ack" exists in table b_messagetransship;

Failed message timing re-send implementation logic:

Each 5s from the b_messagetransship to remove the failed message record, linked in the form of a chain queue in the queue waiting to execute, if the message is thread-processed and the push status is successful in 5s, delete the message record in the database table If the message is thread-processed in 5s and the push status fails, the message record in the database table remains the same, and if the message is not processed by the thread in 5s, the message retains the second copy in the queue of the pending task when it is triggered by the next scheduled postback, and so on;

Causes of bug discovery:

B_messagetransship table failure pushes a larger message volume, b_messagetransship table 11w+ data, the reason for the large number of failed messages:

① 11,907 "Good-Bye", Status: send_failed

Cause: B The client has not received a good-bye after the conversation, the "{" Type ":" "Close", "Sid": "${sid}" request, the phenomenon may also be generated in practice;

② 17,120 "Very happy to serve you", the status is: Pending_ack

Cause: The test script did not make an ACK to the "Glad to serve you" message;

③ the remainder of the 8w+, a message sent to B for a, presumably the data generated during script preparation;

Development of the next phase optimization ideas:

① increases the survival time for the messages in the B_messagetransship table, and deletes them directly if the timeout is exceeded;

② limit the number of Messagetransship objects in the queue of tasks to be executed and reach a certain number is no longer obtained from b_messagetransship;

Test Script Modification:

① increases the ACK of the "Start" and "closing" messages;

Java_ Memory Leak _ instance 1

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.