This is a creation in Article, where the information may have evolved or changed.
We use online services written by Golang, which typically set up a Golang runtime metrics monitor, including Goroutine NUM, GC num, GC pause, and so on. On the latest launch, GC-related metrics have been found to be abnormal, and GC NUM and GC pause have risen significantly. Because Golang GC is the stop of the world to do, too many GC will preempt the normal execution time of the program, and even affect the services provided externally, so suspended the line, ready to first fix the GC problem.
The GC monitoring when a problem occurs is as follows:
Image.png
Where the blue curve represents GC num, the yellow curve represents GC pause (in MS), and two indicators are cumulative data of 30s. You can see that pause per 30s reaches the second level.
Log on to the online machine, and through the go tool pprof --alloc_objects http://localhost:xxxx/debug/pprof/heap
command, view the sample records assigned by the object. Find the big head of there reflect.Value
, encoding/json
, fmt.Sprint
. Consider that in order to better abstract in the program, using the reflection operation, but the object will be reflect.Value
copied and assigned to the heap, the object in the program is the body of the message, some of the message is very large, so allocate more heap memory. To the program to do a version optimization, remove this reflection logic, instead switch case
, re-online, found that the GC slightly decreased, but the effect is not enough.
Image.png
Continue to do profile, has no reflect.Value
, so can only from the other two to start.
This program is a refactoring version of an old program, in order to do the diff test, add a large number of logs, including debug logs, and even some of the log used to do diff is marshal into JSON. The log library we use does not have special handling, and each log is called First fmt.Sprint
, and the function allocates the object to the heap. In response to the above situation, a large number of log deletions, GC slightly decreased but the effect is not enough.
Image.png
Continue to perform performance analysis to find the GC Big head or JSON-related operations. The main function of this application is to handle incoming messages in JSON format, so it doesn't seem to solve the problem unless the JSON library starts to improve. BTW, in the processing of many messages, there is a class of message body occupies a large number of bytes, is more than 10 times times other messages. Attempt to unsubscribe from this type of message and find that the GC is improving immediately and returning to normal levels. However, this path does not go through.
Analysis of the characteristics of the program, which is based on the message-triggered pattern, each time the message arrives will be processed, processing will have a heap of objects generated. The GC timing of the Golang is determined by the current ratio to the last heap size, which is 100 by default, which is triggered by a new multiplier. Try to increase this ratio export GOGC=400
, trying to reduce the frequency of the GC trigger, found that the effect is good, two indicators have a significant decline, the other indicators are not obvious anomalies, temporarily first such resolution, after the spare time to do the program level optimization.
Image.png