Golang GC Optimization thinking and Case analysis

Source: Internet
Author: User
Tags time 0
This is a creation in Article, where the information may have evolved or changed.

An upcoming go-write high-frequency service, when the pressure test found that the GC is particularly high, up to 10%-15%, this article records the process of optimizing GC and ideas. On-line Environment 1.10.

First, to see if the GC has an exception, we can use Gctrace to track the real-time GC. Perform the following command to see the GC's real-time information.

GODEBUG=gctrace=1 go run cmd/agent_bin.go

The output results are as follows:

GC @37.801s 11%: 0.19+627+0.29 ms Clock, 0.38+424/621/0+0.59 MS CPU, 356->415->225 MB, 453 MB goal, 4 PGC 46 @39. 126s 11%: 2.9+927+0.16 ms Clock, 5.8+342/925/0+0.33 MS CPU, 361->460->275 MB, MB goal, 4 PGC @40.847s 12%: 0. 24+1096+0.12 ms Clock, 0.49+291/1007/0+0.24 MS CPU, 427->559->319 MB, 551 MB goal, 4 PGC @42.771s 12%: 0.26+841+0 . Ms Clock, 0.52+377/830/0+0.24 MS CPU, 486->561->271 MB, 638 MB goal, 4 PGC @44.429s 12%: 3.1+890+0.40 Ms Cloc K, 6.2+492/833/0+0.81 MS CPU, 440->528->294 MB, 543 MB goal, 4 PGC @46.188s 12%: 0.23+1165+0.13 ms Clock, 0.47+62 4/1158/0+0.27 MS CPU, 471->579->323 MB, 589 MB goal, 4 PGC @48.252s 13%: 0.26+1410+0.14 ms Clock, 0.52+358/1336/9 .9+0.28 MS CPU, 506->620->343 MB, 646 MB goal, 4 PGC, @50.942s 13%: 0.27+806+0.51 ms Clock, 0.55+403/805/200+1.0 m s CPU, 549->657->340 MB, 687 MB goal, 4 PGC @53.014s 13%: 0.10+857+0.36 ms Clock, 0.21+467/851/94+0.73 MS CPU, 54 6->666->351 MB, 681 MB goal, 4 P 

GC 45: Represents the 45th GC, with a total of 4 p (threads) participating in the GC.

11%: Indicates the GC occupies a time ratio.

0.19+627+0.29 US:STW (Stop-the-world) 0.19ms, concurrent tagging and scanning time 627ms, STW tagged time 0.29ms.

0.38+424/621/0+0.59 MS CPU, indicating garbage collection consumes CPU time

356->415->225 MB, 453 MB goal, indicates the size of the heap, the size of the heap after the GC, the size of the surviving heap

453 MB Goal indicates that the overall heap size is 435M.

According to the official description, golang1.0 GC can be reduced to less than 100ms, but here the GC is more than 1s, which is obviously unacceptable, indicating that the GC is very big anomaly.

Check the idea, first use pprof to play the entire call process accumulated heap allocation diagram, to find out exactly which module heap allocation anomalies. The terminal outputs SVG through the way in which the code embeds pprof exposes the port.

import _ "net/http/pprof"go func() {    http.ListenAndServe("0.0.0.0:8080", nil)}()

Then the terminal enters:

go tool pprof -alloc_space -cum -svg http://127.0.0.1:8080/debug/pprof/heap > heap.svg

The browser then opens SVG and finds the most expensive call stack:

Using-cum is the heap allocation size of the cumulative function call stack, as the graph will highlight the large route of the call stack, and it is obvious that the IO copyBuffer function of the FETCH function is the most stressful. Here's a rough idea of the source, which one of our functions has a problem?

Next we use escape analysis for this file: directly with Go build--gcflags=-m file:

Our source code is this:

Package Httpclientimport ("io" "io/ioutil" "Net/http" "Net/url" "OS" "Sync" "Time" "Refresh_agent/utils") var client *http . Clientvar buffpool sync. Poolfunc init () {client = &http. Client{}http. Defaulttransport. (*http. Transport). Maxidleconnsperhost = 2000http. Defaulttransport. (*http. Transport). Maxidleconns = 20000}type HttpClient struct{}func newhttpclient () *httpclient {HttpClient: = Httpclient{}return & Httpclient}func (This *httpclient) Replaceurl (srcurl string, IP String) string {httpprefix: = "/http" parsedurl, err: = u Rl. Parse (Srcurl) if err! = Nil {return ""}return Httpprefix + IP + parsedurl.path}func (this *httpclient) downLoadFile (resp *h Ttp. Response) Error {out, err: = OS. OpenFile ("/dev/null", OS. O_rdwr|os. O_create|os. O_append, 0666) defer out. Close () _, err = Io. Copy (out, resp. Body) return Err}func (this *httpclient) Fetch (Dsturl string, method String, ProxyHost string, header map[string]string, PR eload bool, timeout Int64) (*http. Response, error) {//ProxyHost change URL in Request Newurl:= This.replaceurl (Dsturl, ProxyHost) req, _: = http. Newrequest (method, Newurl, nil) for k, V: = Range Header {Req. Header.add (k, v)}req. Host = Utils. Geturlhost (dsturl) client. Timeout = time. Duration (Timeout) * time. SECONDRESP, Err: = client. Do (req) if resp = = Nil | | Err! = Nil {return resp, err}if preload {err: = This.downloadfile (RESP) if err! = Nil {return nil, Err}}io. Copy (Ioutil. Discard, resp. Body) resp. Body.close () Return resp, nil}

The results of the

Escape analysis are as follows:

Go build--gcflags=-m libs/httpclient/httpclient.go# REFRESH_AGENT/VENDOR/GITHUB.COM/GARYBURD/REDIGO/REDISGC 1 @ 0.078s 5%: 0.075+24+0.040 ms Clock, 0.15+0.15/11/17+0.081 MS CPU, 16->16->11 MB, MB goal, 2 p# refresh_agent/vend  OR/GITHUB.COM/IMROC/REQGC 1 @0.071s 3%: 0.017+26+0.050 ms Clock, 0.034+0.22/6.0/22+0.10 MS CPU, 16->17->11 MB, MB  Goal, 2 p# REFRESH_AGENT/VENDOR/GITHUB.COM/BURNTSUSHI/TOMLGC 1 @0.054s 6:0.086+43+0.062 ms Clock, 0.17+0.14/12/14+0.12 Ms CPU, 16-&GT;17-&GT;12 MB, goal MB, 2 p# REFRESH_AGENT/VENDOR/GITHUB.COM/MODERN-GO/REFLECT2GC 1 @0.054s 7%: 0.070+28 +0.051 ms Clock, 0.14+0.15/12/14+0.10 MS CPU, 16-&GT;17-&GT;10 MB, goal MB, 2 PGC 2 @0.334s 3:0.016+33+0.055 ms clock  , 0.032+0.54/12/20+0.11 MS CPU, 37-&GT;38-&GT;14 MB, goal MB, 2 p# REFRESH_AGENT/VENDOR/GITHUB.COM/JSON-ITERATOR/GOGC 1 @0.045s 5%: 0.068+27+0.040 ms Clock, 0.13+2.5/4.7/16+0.081 MS CPU, 16->16->11 MB, 0 MB goal, 2 PGC 2 @0.211s 9%: .029+107+0.12 ms Clock, 0.058+0.96/52/0+0.24 MS CPU, 39->44->26 MB, goal MB, 2 p# command-line-argumentslibs/httpclient/httpclient.go:18:6: Can Inli NE init.0libs/httpclient/httpclient.go:26:6: Can inline newhttpclientlibs/httpclient/httpclient.go:19:24: &http . Client literal escapes to Heaplibs/httpclient/httpclient.go:28:9: &httpclient escapes to heaplibs/httpclient/ Httpclient.go:27:2: Moved to heap:httpclientlibs/httpclient/httpclient.go:31:62:leaking param:srcurllibs/ Httpclient/httpclient.go:37:25:httpprefix + IP + parsedurl.path escapes to heaplibs/httpclient/httpclient.go:31:62: (* HttpClient). Replaceurl This does not escapelibs/httpclient/httpclient.go:31:62: (*httpclient). Replaceurl IP does not Escapelibs/httpclient/httpclient.go:44:18:out escapes to Heaplibs/httpclient/httpclient.go:44:28:resp. Body escapes to heaplibs/httpclient/httpclient.go:41:59:leaking param Content:resplibs/httpclient/httpclient.go : 41:59: (*httpclient). DownLoadFile This does not escapelibs/httpclient/httpclient.go:48:151:leaking param:dsturllibs/httpclient/httpclient.go:48:151:leaking Param:methodlibs/httpclient/httpclient.go : 48:151:leaking param Content:headerlibs/httpclient/httpclient.go:69:30:resp. Body escapes to heaplibs/httpclient/httpclient.go:48:151: (*httpclient). Fetch This does not escapelibs/httpclient/httpclient.go:48:151: (*httpclient). Fetch ProxyHost does not escape<autogenerated>:1:0: leaking param:. anon0<autogenerated>:1:0: Leaking param :. this<autogenerated>:1:0: Leaking param:io.p<autogenerated>:1:0: leaking param:. this<autogenerated >:1:0: OS. (*file). Close. This does not escape<autogenerated>:1:0: leaking param:. this<autogenerated>:1:0: Leaking Param:. this<autogenerated>:1:0: Leaking param:. this<autogenerated>:1:0: Leaking param:context.key< autogenerated>:1:0: Leaking param:. This

Here we find resp. The Body escapes to the heap. And our original hope is inconsistent, our aim is to redirect resp.body to the empty device, where the object is preferably allocated to the stack area. There is no point at which the object is created again and again.

Continue chasing Io. CopyBuffer, view io. Copy of the source code:

Func Copy (DST Writer, SRC Reader) (written int64, err error) {return copyBuffer (DST, SRC, nil)}//CopyBuffer is identical  To Copy except this it stages through the//provided buffer (if one is required) rather than allocating a//temporary one. If buf is nil, one is allocated; Otherwise if it has//zero length, CopyBuffer panics.func CopyBuffer (DST Writer, src Reader, buf []byte) (written int64, E RR error) {if buf! = nil && len (buf) = = 0 {Panic ("empty buffer in IO. CopyBuffer ")}return CopyBuffer (DST, SRC, buf)}func CopyBuffer (DST Writer, src Reader, buf []byte) (written int64, err Erro R) {//If the reader has a WriteTo method, use it to does the copy.//avoids an allocation and a copy.if wt, OK: = src. ( Writerto); OK {return WT. WriteTo (DST)}//Similarly, if the writer has a Readfrom method, use it to do the Copy.if rt, OK: = DST. (Readerfrom); OK {return RT. Readfrom (src)}size: = 1024if L, OK: = src. (*limitedreader); OK && Int64 (size) > L.N {if L.N < 1 {size = 1} ELSE {size = INT (L.N)}}if buf = = Nil {buf = make ([]byte, size) #这里反复创建 size 32k Slice} 

See the function of the call stack hint, found that when we call copy, if we do not pass in buffer, the internal will repeatedly create this buf. Basically, the problem is fixed.

So we know that an object will always be recycled by GC, how to deal with it, the general practice is to use object pooling.

After the transformation, we copy the time to pass a buffer in, this buffer from sync. Pool to fetch:

var buffPool sync.Poolfunc (this *HttpClient) downLoadFile(resp *http.Response) error {//err write /dev/null: bad file descriptor#out, err := os.OpenFile("/dev/null", os.O_RDWR|os.O_CREATE|os.O_APPEND, 0666)defer out.Close()buf := buffPool.Get()if buf == nil {buf = make([]byte, 32*1024)}io.CopyBuffer(out, resp.Body, buf.([]byte))buffPool.Put(buf)return err

Using this method, the heap occupancy here is immediately lowered:

Through this way of thinking, the heap cumulative distribution of some of the more frequent points to find out, the GC dropped below 100ms.

To record several GC-optimized points, high-frequency requests must be noted:

1, the function should try not to return to map, slice object, this frequently called function will bring pressure to the GC.

2, small objects to be merged.

3, the function frequently creates simple objects, returns the object directly, the effect is better than the return pointer effect.

4, can't avoid, can use sync. Pool is used, although some people say it is not recommended to use sync after 1.10. Pool, but in terms of pressure measurement, it is still used effect, heap accumulation allocation size can be reduced by more than half.

5, type conversion to note that the official usage of the consumption is particularly large, recommended the use of rain marks way.

6, avoid creating slice repeatedly.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.