wrote a daemon program with go: Used to detect the surviving state of Redis and write the results to zookeeper, deployed on a redis machine, and for each Redis instance there is a goroutine every fixed time to detect its state. The main goroutine is responsible for signal processing, and then kill other goroutine when receiving the signal. The program ran for some time to find that some redis instances of the corresponding zookeeper information is not updated, through the log found that the corresponding Redis goroutine hanging off. Read source discovery seems to be ZK's third-party library throws an unexpected exception to cause.
In order to solve this problem, the logical refactoring: by the main goroutine every fixed time, for each Redis instance to start a goroutine to detect, to avoid unexpected exception cause goroutine hang off, so that the status information is not updated. This approach is also reasonable because the goroutine creation overhead is low and the Golang officially recommends using a large number of goroutine to fight concurrency. After refactoring, the on-line test found that there was a memory leak.
(1) observing the GC
First of all, the code review, because it was written six months ago, and recently all useless golang, so no bug found. Next, you might want to look at the GC-related information and maybe reveal something. The online Golang GC-related, described in the runtime's doc, by setting the environment variable godebug= ' gctrace=1 ' can let go runtime to print GC information to stderr.
Godebug= ' gctrace=1 './sentinel-agent >gc.log &
The output of the Gc.log is as follows:
gc781 (1): 1+2385+17891+0 us, 0 MB, 21971 (3503906-3481935) objects, 13818/14/7369 sweeps, 0 () handoff, 0 (0) Ste AL, 0/0/0 yieldsgc782 (1): 1+1794+18570+1 us, 0 MB, 21929 (3503906-3481977) objects, 13854/1/7315 sweeps, 0 ha Ndoff, 0 (0) steal, 0/0/0 yieldsgc783 (1): 1+1295+20499+0 us, MB, 21772 (3503906-3482134) objects, 13854/1/7326 Sweeps, 0 (0) handoff, 0 (0) steal, 0/0/0 yields
gc781: Starting from the start of the program, the NO. 781 time GC
(1): Number of threads participating in GC
1+2385+17891+0:1) Stop-the-world time, that is, pause all goroutine;2) to sweep the marked object, 3) time to mark the garbage object, 4) wait for the thread to end. Units are the sum of the us,4 is the overall time-consuming of GC pauses
The memory occupied by the surviving objects on the heap, and the entire heap size (including garbage objects) after MB:GC
21971 (3503906-3481935) OBJECTS:GC, the number of objects on the heap, the objects that were allocated before the GC, and the objects that were released this time
13818/14/7369 sweeps: Describes the object sweep phase. A total of 13,818 memory spans, of which 14 were swept in the background, 7369 were swept during Stop-the-world
0 (0) handoff,0 (0) Steal: Describes the load balancing feature of the parallel tagging phase. Current number of transfer operands and total transfer operations between different threads, as well as current steal operands and total steal operands
0/0/0 yields: Describes the efficiency of the parallel tagging phase. In the process of waiting for other threads, a total of 0 times yields
After observing the output of the GC, it is found that the total number of objects on the current heap is increasing, there is no tendency to decrease, which indicates that there is an object leak, which leads to memory leak.
(2) Memory profile
According to the Golang website Profile Guide, add in the code
Import _ "Net/http/pprof" Func Main () { go func () { http. Listenandserve ("localhost:6060", Nil) } ()}
The program can be profile at run time and accessed via http:
Go Tool pprof Http://localhost:6060/debug/pprof/heap
For memory profile, the default is--inuse_space, which shows the space occupied by the currently active object (excluding garbage objects). Use--alloc_space to display all assigned objects, including garbage objects. But neither of these methods has found an exception.
(3) monitor the number of Goroutine
Through runtime. Numgoroutine () can get the current number of goroutine. By adding HTTP server to the program to get some statistics to understand the running state of the program, this is the way Jeff Dean respected. See the number of goroutine in real time by adding the following code
Goroutine stats and Pprof go func () { http. Handlefunc ("/goroutines", func (w http). Responsewriter, R *http. Request) { num: = StrConv. Formatint (Int64 (runtime). Numgoroutine ()), w.write ([]byte (num)) }); http. Listenandserve ("localhost:6060", nil) Glog. Info ("Goroutine stats and Pprof listen on 6060") } ()
by command:
Curl Localhost:6060/goroutines
Query the current number of goroutine. By not running the program during the continuous review, found that the number of goroutine is increasing, there is no sign of destruction.
(4) goroutine leakage
Through the above observation, found that there is goroutine leakage, that is, Goroutine did not exit normally. Because multiple goroutine are created per round (once every 10 seconds), there is a lot of goroutine if you do not exit normally. The go GC uses mark and sweep to scan all the surviving objects from the global variables, goroutine stacks, and if the goroutine does not exit, it will leak a lot of memory.
After determining that the goroutine did not exit properly, re-review the code and found the root cause of the leak. Before refactoring, in the signal processing program, in order to gracefully terminate the program, for each goroutine there is a channel for the main goroutine wait for all goroutine normal end before exiting. In the main goroutine, the signal handler is used to wait for all Goroutine code:
Waiters = make ([]chan int, Num] for _, W: = Range Waiters { <-W}
The goroutine that performs the check logic is called AG.W <-1 to send a message to the primary goroutine after the end.
After refactoring, because Goroutine is created for each wheel, the channel size between the goroutine used for the main goroutine and the check logic is 1, so all created check goroutine are blocked on AG.W <-1 and cannot exit normally. Finally, the channel logic is removed, there is no goroutine leakage.
(5) Summary
-Goroutine management is important, if the goroutine leaks, there will be a memory leak
-Built-in HTTP server for viewing program running status
-Currently, Go's GC is also fragile, minimizing the creation of objects, and caching caches. Because of the number of objects, the scanning time will also be extended
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Go memory leak case