This is a creation in Article, where the information may have evolved or changed.
Go program memory leak analysis and avoid
2016-10-18
To the system pressure, memory occupied up, stop the suppression, still can not fall down, there may be a leak.
For stateless services, there is a request on the connection, memory up. The request was stopped, but the memory was still high, and when the connection was dropped, the session might have unreasonable references, which prevented the GC from being recycled.
Look at the memory footprint and don't stare at the top of the number, because go to the system to apply the memory is not used, will not be returned to the system immediately. It also means that the memory consumption does not drop immediately after the stop is normal behavior, just look at this one need to spend a little more time to observe for a while. It is best to focus on a few data: one is the system memory used by the program, the other is the heap memory of Go, and the last is the actual memory. The memory requested from the system will be managed in the Go memory pool, the whole block of memory pages, not be accessed for a long time and meet certain conditions, the return to the operating system. And because there are GC, heap memory can not represent memory consumption, after cleaning up the rest, is the actual use of memory. Call runtime.ReadMemStats
can see the memory usage information of go, or enable net/pprof after access to Http://127.0.0.1:6060/debug/pprof/heap, also can see, where heapinuse is the actual memory usage, has reference meaning; You can also get finer information with the parameter debug/pprof/heap?debug=2,
After discovering the problem, first do not suspect that the go language GC has a problem, be sure to believe that their code is writing a problem. The memory is not released, it is not leaked, but the code must have somewhere to refer to that memory, causing the GC to fail to release. How do you check the problem? Go language has pprof this artifact.
After looking at the Heapinuse in the previous step, confirming that the memory has not been released, you can use to go tool pprof -inuse_space http://127.0.0.1:6060/debug/pprof/heap
see where the memory is occupied, here you can see roughly what the function of memory, according to the code can infer some information. Generally speaking, it is very likely that goroutine leak, and then the memory inside the reference to be released. To give an example:
ch := make(chan T)go produce(ch) { // 生产者往ch里写数据 ch <- T{}}go consume(ch) { // 消费者从ch里读出数据 <-ch err := doSomeThing()}
The consumer takes err and exits, no longer reads CH, causing the producer to block on it, ch <- T{}
and then the producer's goroutine is leaked, and the memory referenced in it can never be released. This is a very common scenario, such as opening multiple worker,worker processing good data after the channel, the main thread read channel, but the main thread processing error exits, if the processing of improper worker can leak.
After you turn on net/pprof, you can see all of the current goroutine stacks through http://127.0.0.1:6060/debug/pprof/goroutine?debug=1, which goroutine can be found, Where the current execution is, you can find where the goroutine leaks.
It says how to analyze and locate the memory leak of the Go program, and then let's talk about how to avoid writing a leaked code.
The first principle is that the channel must not be closed by consumers, because writing data to the closed channel is panic. The correct posture is when the producer finishes all the data, closes the channel, and the consumer is responsible for consuming all the data in the channel:
func produce(ch chan<- T) { defer close(ch) // 生产者写完数据关闭channel ch <- T{}}func consume(ch <-chan T) { for _ = range ch { // 消费者用for-range读完里面所有数据 }}ch := make(chan T)go produce(ch)consume(ch)
Why did consume read all the data in the channel? Because go produce()
there may be more than one, so write the code, after reading CH can be determined that all produce goroutine are exited, will not leak.
The second principle is to use the closed channel to broadcast the cancellation action. Reading data to the closed channel is never blocked, which is the advanced technique. Assuming that the consumer gets the data after the error occurs, the entire action fails, then there is a mechanism to inform the producer to stop and exit.
func produce(ch chan<- T, cancel chan struct{}) { select { case ch <- T{}: case <- cancel: // 用select同时监听cancel动作 }}func consume(ch <-chan T, cancel chan struct{}) { v := <-ch err := doSomeThing(v) if err != nil { close(cancel) // 能够通知所有produce退出 return }}for i:=0; i<10; i++ { go produce()}consume()
Waitgroup and the like can be used with, look at their favorite style. Basically can deal with the error scene of the release of resources, the problem is not big. Oh, the 0th principle is to be in awe of concurrent code, even with go, even if there is a channel so useful!
In the context of the cancel is worth reference and learning, in fact, there is no skill, is to read more code, the standard library code is an excellent learning material.