Optimize the Go mode

Source: Internet
Author: User
Tags curl garbage collection key string rand signalfx
Recently, in the optimization Go project, learned a bit about Golang tuning related content. Found a very good article, translated out to share to everyone.

Before writing an article "Why Signalfx metric proxy through Go language development", this article will focus on our ingest services as an example of how we optimize go code.

Singalfx is based on stream analytics and time alarm sequences, such as application metrics, which can be developed for modern applications in time series data ("How many requests my application received.") "), as well as system-level metrics (" How much network traffic my Linux server uses. ") ”)。 Our user traffic is very high and granular, each user's traffic must first through our ingest service to access other SIGNALFX services. First step: Enable Pprof what is pprof.

Pprof is the standard method built into the go language for debugging Go program performance. The PPROF package can be invoked via HTTP, which extracts the CPU and memory data of the application, as well as the number of lines of code and the content information that is running. how to enable Pprof.

You can add a line of code import _ "Net/http/pprof" to your app and start your application server, pprof even if it's started. There is another way, when we are doing signalfx, in order to control the pprof externally, we attach some handlers, can be exposed with the routing settings, the code is as follows:

Import "Github.com/gorilla/mux"
import "Net/http/pprof"
var handler *mux. Router
//...
Handler. Pathprefix ("/debug/pprof/profile"). Handlerfunc (pprof. Profile)
handler. Pathprefix ("/debug/pprof/heap"). Handlerfunc (pprof. HEAP)
Step Two: Find the code that can be optimized what to do.
Curl Http://ingest58:6060/debug/pprof/profile >/tmp/ingest.profile
go tool pprof Ingest/tmp/ingest.profile
(pprof) TOP7
What's this for?

The

Go language contains a local pprof tool to visualize the results of the output pprof. We configured the route/debug/pprof/profile to collect 30 seconds of data. The first step of my operation above is to save the output to a local file and then run the saved file. It is worth mentioning that the last parameter can be entered directly into a URL to replace the file (Translator Note: Go tool pprof Ingest http://ingest58:6060/debug/pprof/profile). The command TOP7 can show the 7 most CPU-consuming functions. Results

12910ms of 24020ms Total (53.75%)
Dropped 481 nodes (cum <= 120.10ms)
Showing top nodes out of 275 (Cum > = 160ms)
     flat  flat%   sum%        cum   cum%
   1110ms  4.62%  4.62%     2360ms  9.83%  RUNTIME.MALLOCGC
    940ms  3.91%  8.53%     1450ms  6.04%  runtime.scanobject
    830ms  3.46% 11.99%      830ms  3.46%  runtime.futex
    800ms  3.33% 15.32%      800ms  3.33%  runtime.mSpan_Sweep.func1
    750ms  3.12% 18.44%      750ms  3.12%  runtime.cmpbody
    720ms  3.00% 21.44%      720ms  3.00%  runtime.xchg
    580ms  2.41% 23.86%      580ms  2.41%  Runtime._externalcode
Why is this the result?

We can see that none of these functions have been called directly. However, MALLOCGC, Sacnobject, and mspan_sweep all result in a high CPU consumption when garbage collection occurs. We can learn more about these functions than to optimize the go language garbage collector itself, and the better way to optimize it is to optimize the way we use the Go language garbage collector in our code. In this example, what we can optimize is to reduce the creation of objects on the heap. Step Three: Explore the causes of GC do what.

Curl http://ingest58:6060/debug/pprof/heap >/tmp/heap.profile
Go Tool pprof-alloc_objects/tmp/ingest/tmp/ Heap.profile
(pprof) top3
did what.

Note that the URL for this download is somewhat similar to the previous one, but ends with/heap. This will provide us with a summary of the data used in the heap above the machine. I save it again as a comparison behind the file user. The parameter-alloc_objects will visualize the number of objects allocated by the application during execution. Results

4964437929 of 7534904879 Total (65.89%)
Dropped 541 nodes (cum <= 37674524)
Showing Top Ten nodes out of 133 (CU M >= 321426216)
     flat  flat%   sum%        cum   cum%
853721355 11.33% 11.33%  859078341 11.4%  Github.com/signalfuse/sfxgo/ingest/tsidcache/tsiddiskcache. (*diskkey). Encodeold
702927011  9.33% 20.66%  702927011  9.33%  reflect.unsafe_new
624715067  8.29% 28.95%  624715067  8.29%  github.com/signalfuse/sfxgo/ingest/bus/rawbus. ( *partitioner). Partition
What do you mean?

As you can see, 11.33% of the object assignments occur in the object Diskkey function Encodeold, which we expect is the result. However, it is not expected that the partition function takes up 8.29% of the total memory allocation, because this function is just some basic calculation, I have to focus on this problem. Fourth Step: Find out why Partitioner uses so much memory do what.

(pprof) List partitioner.*partition
did what.

This command can print out the source code line I'm following, and what code inside the function is causing the heap's memory request. This is one of many commands within the pprof. Another very useful is to look at the caller and the callee. You can see the full help by helping and try it all. Results

total:11323262665 ROUTINE ======================== Github.com/signalfuse/sfxgo/ingest/bus/rawbus. (*partitioner). Partition in/opt/jenkins/workspace/ingest/gopath/src/github.com/signalfuse/sfxgo/ingest/bus/rawbus/          Partitioner.go 927405893 927405893 (flat, cum) 8.19% of total.     .          64:if Ringsize = = 0 {.     .          65:return 0, Errunsetringsize.     .          66:}.     .          67:var b [8]byte.     . 68:binary. Littleendian.putuint64 (b[:], UInt64 (message. Key. (*partitionpickingkey). Tsid)) 239971917 239971917 69:logherd. DEBUG2 (log, "key", message.)          Key, "Nump", numpartitions, "partitioning").     . 70:murmhash: = Murmur3.          Sum32 (b[:]).     .          71:.     .          //34026 = 66.     .          73:setbits: = UINT (16).     .          74:setsize: = UInt32 (1 << setbits).     . 75:shorthash: = Murmhash & (setSize-1)
        .     . 76:smallindex: = Int32 (Shorthash) * Int32 (K.ringsize)/int32 (SetSize) 687433976 687433976 77:logherd.          DEBUG3 (log, "Smallindex", Smallindex, "Murmhash", Murmhash, "Shorthash", Shorthash, "sending to Partition").     .          78:return Smallindex, Nil.     .          79:}.     .
 80:
What do you mean?

This can indicate that the debug log is the cause of the variable escaping from the stack to the heap. Since the debug log is not directly required, I can delete the rows directly. But first, let's confirm this hypothesis. Logherd. The DEBUG2 function appears to encapsulate the following, and the Withfield object is not called if the log level debug does not meet the criteria.

Debug2 to Logger 2 key/value pairs and message.  Intended to save the Mem alloc that Withfield creates
func Debug2 (L *logrus. Logger, key string, Val interface{}, Key2 string, Val2 interface{}, msg string) {
     if L.level >= Logrus. DebugLevel {
          L.withfield (key, Val). Withfield (Key2, Val2). Debug (msg)
     }
}

From the pprof detection appears to be passing the integer to the DEBUG2 function caused by the memory allocation, let us further confirm. Fifth Step: Find the cause of the memory allocation caused by the log statement What to do:

Go build-gcflags= '-M '. 2>&1 | grep partitioner.go
What's this for?

Compiling with the-m parameter allows the compiler to print content to stderr. This includes whether the compiler can allocate memory on the stack or must put the variable on the heap to apply. If the compiler cannot determine whether a variable is being called externally, he will be put on the heap by the go language. Results

./partitioner.go:63: &k.ringsize escapes to heap
./partitioner.go:62:leaking param:k
./partitioner.go : 70:message. Key escapes to heap
./partitioner.go:62:leaking param content:message
./partitioner.go:70:numpartitions Escapes to heap
./partitioner.go:77:smallindex escapes to heap
./partitioner.go:77:murmhash escapes to Heap
  ./partitioner.go:77:shorthash escapes to heap
./partitioner.go:68: (*partitioner). Partition B does not escape
./partitioner.go:71: (*partitioner). Partition B does not escape

Notice line 77th, Smallindex, Murmhash, and Shorthash all escaped to the heap. The compiler has applied space on the heap for short-life variables, causing us to create a lot of objects that we don't need on the pair. Sixth step: Pressure measurement of partition function write what.

Func benchmarkpartition (b *testing. B) {

     r: = Rand. New (Rand. Newsource (0))

     k: = partitionpickingkey{}

     msg: = Sarama. producermessage {

          Key: &k,

     }

     P: = partitioner{

          ringsize:1024,

          ringname: "Quantizer.ring",

     }

     Num_partitions: = Int32 (1024x768)

     for I: = 0; i < B.N; i++ {

          k.tsid = r.int63 () part

          , err: = P.partition (&am P;msg, num_partitions)

          if err! = Nil {

               Panic ("Error Benchmarking")

          } if part

          < 0 | | Part >= NUM_ partitions {

               Panic ("Bench Failure")}}


The pressure test simply creates a B.N object and, when returned, does a basic check to make sure that the object is not simply optimized. We recommend that programmers write the code before optimizing the code to make sure you're in the right direction. Seventh Step: partition function to measure memory allocation do what.

Go test-v-bench. -run=_none_-benchmem benchmarkpartition
did what.

The pressure measurement will conform to the regular match "." Conditional function,-benchmen will track the average heap usage per loop. By passing the parameter-run=_none_, I can save some time so that the test will only run unit tests with a "NONE" string. In other words, no one unit test is run, only the full stress test is run. Results

PASS

BenchmarkPartition-8 10000000       202 ns/op      b/op       4 allocs/op
what it means.

Each cycle consumes an average of 202ns, and most importantly, each operation has 4 object allocations. Eighth step: Delete the log statement write.

@@ -66,7 +65,6 @@ func (k *partitioner) Partition (Message *sarama. Producermessage, Numpartitions i} var b [8]byte Binary. Littleendian.putuint64 (b[:], UInt64 (message. Key. (*partitionpickingkey). Tsid))-Logherd. DEBUG2 (log, "key", message.) Key, "Nump", numpartitions, "partitioning") Murmhash: = Murmur3. Sum32 (b[:])//34026 = @@ -74,7 +72,6 @@ -74,7 (k *partitioner) Partition (Message *sarama. Producermessage, numpartitions i setSize: = UInt32 (1 << setbits) Shorthash: = Murmhash & (setsiz e-1) Smallindex: = Int32 (Shorthash) * Int32 (K.ringsize)/int32 (SetSize)-Logherd.  DEBUG3 (log, "Smallindex", Smallindex, "Murmhash", Murmhash, "Shorthash", Shorthash, "sending to Partition") return Smallindex, nil} 
what did it do.

My fix is to delete the log code. During/during the test, I added these debug code, but never deleted them. In this case, it is easiest to remove the code. nineth Step: Recompile evaluates whether the variable escaped to the heap how to execute.

Go build-gcflags= '-M '. 2>&1 | grep partitioner.go
Results
./partitioner.go:62: &k.ringsize escapes to heap

./partitioner.go:61:leaking param:k

./partitioner.go : A: (*partitioner). Partition message does not escape

./partitioner.go:67: (*partitioner). Partition B does not escape

./partitioner.go:68: (*partitioner). Partition B does not escape
what it means.

You can find that Smallindex, Murmhash, and Shorthash variables do not have messages escaping to the heap. Tenth step: Re-test the memory allocation of each operation how to execute.

Go test-v-bench. -run=_none_-benchmem benchmarkpartition
Results
PASS

BenchmarkPartition-8 30000000        40.5 ns/op       0 b/op       0 allocs/op

OK   github.com/signalfuse/ Sfxgo/ingest/bus/rawbus 1.267s
What do you mean?

Note that each operation consumes only 40ns, and more importantly, there is no memory allocation for each operation. Because I was prepared to optimize the heap, it was important to me. Concluding remarks

Pprof is a very useful tool to analyze the performance issues of Go code. By combining the tools built into the go language, you can get real numbers about the changes that are caused by code changes. Unfortunately, the performance slowdown will rise over time. Next, the reader can practice, save benchmark results to the database so that you can see the code's performance after each code submission.

Original link: "Translation" to optimize the mode of go, reproduced please indicate the source.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.