"Translate" to optimize go mode

Source: Internet
Author: User
Tags key string signalfx
This is a creation in Article, where the information may have evolved or changed.

Before writing an article "Why Signalfx metric proxy through Go language development", this article will focus on our ingest services as an example of how we optimize go code.

Singalfx is based on stream analytics and time alarm sequences, such as application metrics, which can be developed for modern applications in time series data ("How many requests did my application receive?"). "), as well as system-level metrics (" How much network traffic does my Linux server use? "). ”)。 Our user traffic is very high and granular, each user's traffic must first through our ingest service to access other SIGNALFX services.

First step: Enable Pprof

What is Pprof?

Pprof is the standard method built into the go language for debugging Go program performance. The PPROF package can be invoked via HTTP, which extracts the CPU and memory data of the application, as well as the number of lines of code and the content information that is running.

How do I enable pprof?

You can add a line of code to your app import _ "net/http/pprof" and then launch your application server, pprof even if it's started. There is another way, when we are doing signalfx, in order to control the pprof externally, we attach some handlers, can be exposed with the routing settings, the code is as follows:

import "github.com/gorilla/mux"import "net/http/pprof"var handler *mux.Router// ...handler.PathPrefix("/debug/pprof/profile").HandlerFunc(pprof.Profile)handler.PathPrefix("/debug/pprof/heap").HandlerFunc(pprof.Heap)

Step two: Find the code that can be optimized

What do you want to do?

curl http://ingest58:6060/debug/pprof/profile > /tmp/ingest.profilego tool pprof ingest /tmp/ingest.profile(pprof) top7

What the hell is this?

The go language contains a local pprof tool to visualize the results of the output pprof. We configured the route /debug/pprof/profile to collect 30 seconds of data. The first step of my operation above is to save the output to a local file and then run the saved file. It is worth mentioning that the last parameter can be entered directly into a URL to replace the file (translator Note: go tool pprof ingest http://ingest58:6060/debug/pprof/profile ). The command TOP7 can show the 7 most CPU-consuming functions.

Results

12910ms of 24020ms total (53.75%)Dropped 481 nodes (cum <= 120.10ms)Showing top 30 nodes out of 275 (cum >= 160ms)     flat  flat%   sum%        cum   cum%   1110ms  4.62%  4.62%     2360ms  9.83%  runtime.mallocgc    940ms  3.91%  8.53%     1450ms  6.04%  runtime.scanobject    830ms  3.46% 11.99%      830ms  3.46%  runtime.futex    800ms  3.33% 15.32%      800ms  3.33%  runtime.mSpan_Sweep.func1    750ms  3.12% 18.44%      750ms  3.12%  runtime.cmpbody    720ms  3.00% 21.44%      720ms  3.00%  runtime.xchg    580ms  2.41% 23.86%      580ms  2.41%  runtime._ExternalCode

Why is this the result?

We can see that none of these functions have been called directly. However, mallocgc and sacnobject mSpan_Sweep all of this can lead to garbage collection when CPU usage is high. We can learn more about these functions than to optimize the go language garbage collector itself, and the better way to optimize it is to optimize the way we use the Go language garbage collector in our code. In this example, what we can optimize is to reduce the creation of objects on the heap.

Step three: Explore the causes of GC

Do what?

curl http://ingest58:6060/debug/pprof/heap > /tmp/heap.profilego tool pprof -alloc_objects /tmp/ingest /tmp/heap.profile(pprof) top3

What have you done?

Note that the URL for this download is somewhat similar to the previous one, but ends with/heap. This will provide us with a summary of the data used in the heap above the machine. I save it again as a comparison behind the file user. The parameter-alloc_objects will visualize the number of objects allocated by the application during execution.

Results

4964437929 of 7534904879 total (65.89%)Dropped 541 nodes (cum <= 37674524)Showing top 10 nodes out of 133 (cum >= 321426216)     flat  flat%   sum%        cum   cum%853721355 11.33% 11.33%  859078341 11.40%  github.com/signalfuse/sfxgo/ingest/tsidcache/tsiddiskcache.(*DiskKey).EncodeOld702927011  9.33% 20.66%  702927011  9.33%  reflect.unsafe_New624715067  8.29% 28.95%  624715067  8.29%  github.com/signalfuse/sfxgo/ingest/bus/rawbus.(*Partitioner).Partition

What does it mean?

As you can see, 11.33% of the object assignments occur in the object DiskKey 's function EncodeOld , and we expect this to be the result. However, it is not expected that the partition function takes up 8.29% of the total memory allocation, because this function is just some basic calculation, I have to focus on this problem.

Fourth step: Find out why Partitioner uses so much memory

Do what?

(pprof) list Partitioner.*Partition

What have you done?

This command can print out the source code line I'm following, and what code inside the function is causing the heap's memory request. This is one of many commands within the pprof. Another very useful is to look at the caller and the callee. You can see the full help by helping and try it all.

Results

Total:11323262665routine ======================== Github.com/signalfuse/sfxgo/ingest/bus/rawbus. (*partitioner). Partition in/opt/jenkins/workspace/ingest/gopath/src/github.com/signalfuse/sfxgo/ingest/bus/rawbus/          partitioner.go927405893 927405893 (flat, cum) 8.19% of total.     .          64:if Ringsize = = 0 {.     .          65:return 0, Errunsetringsize.     .          66:}.     .          67:var b [8]byte.     . 68:binary. Littleendian.putuint64 (b[:], UInt64 (message. Key. (*partitionpickingkey). Tsid)) 239971917 239971917 69:logherd. DEBUG2 (log, "key", message.)          Key, "Nump", numpartitions, "partitioning").     . 70:murmhash: = Murmur3.          Sum32 (b[:]).     .          71:.     .          //34026 = 66.     .          73:setbits: = UINT (16).     .          74:setsize: = UInt32 (1 << setbits).     .     75:shorthash: = Murmhash & (setSize-1).     . 76:smallindex: = Int32 (Shorthash) * Int32 (K.ringsize)/int32 (SetSize) 687433976 687433976 77:logherd.          DEBUG3 (log, "Smallindex", Smallindex, "Murmhash", Murmhash, "Shorthash", Shorthash, "sending to Partition").     .          78:return Smallindex, Nil.     .          79:}.     . 80:

What does it mean?

This can indicate that the debug log is the cause of the variable escaping from the stack to the heap. Since the debug log is not directly required, I can delete the rows directly. But first, let's confirm this hypothesis. The logherd.Debug2 function looks like the following, and if the log level debug does not meet the criteria, the Withfield object is not called.

// Debug2 to logger 2 key/value pairs and message.  Intended to save the mem alloc that WithField createsfunc Debug2(l *logrus.Logger, key string, val interface{}, key2 string, val2 interface{}, msg string) {     if l.Level >= logrus.DebugLevel {          l.WithField(key, val).WithField(key2, val2).Debug(msg)     }}

From Pprof detection appears to be passing an integer-to Debug2 -function-caused memory allocation, let's further confirm.

Fifth step: Find the cause of the memory allocation caused by the log statement

What to do:

go build -gcflags='-m' . 2>&1 | grep partitioner.go

What's this for?

Compiling with the-m parameter allows the compiler to print content to stderr. This includes whether the compiler can allocate memory on the stack or must put the variable on the heap to apply. If the compiler cannot determine whether a variable is being called externally, he will be put on the heap by the go language.

Results

./partitioner.go:63: &k.ringSize escapes to heap./partitioner.go:62: leaking param: k./partitioner.go:70: message.Key escapes to heap./partitioner.go:62: leaking param content: message./partitioner.go:70: numPartitions escapes to heap./partitioner.go:77: smallIndex escapes to heap./partitioner.go:77: murmHash escapes to heap./partitioner.go:77: shortHash escapes to heap./partitioner.go:68: (*Partitioner).Partition b does not escape./partitioner.go:71: (*Partitioner).Partition b does not escape

Take note of line 77th, and smallIndex murmHash all the shortHash escape to the heap. The compiler has applied space on the heap for short-life variables, causing us to create a lot of objects that we don't need on the pair.

Sixth step: Pressure measurement of partition function

Write what?

func BenchmarkPartition(b *testing.B) {     r := rand.New(rand.NewSource(0))     k := partitionPickingKey{}     msg := sarama.ProducerMessage {          Key: &k,     }     p := Partitioner{          ringSize: 1024,          ringName: "quantizer.ring",     }     num_partitions := int32(1024)     for i := 0; i < b.N; i++ {          k.tsid = r.Int63()          part, err := p.Partition(&msg, num_partitions)          if err != nil {               panic("Error benchmarking")          }          if part < 0 || part >= num_partitions {               panic("Bench failure")          }     }}

The pressure test simply creates a B.N object and, when returned, does a basic check to make sure that the object is not simply optimized. We recommend that programmers write the code before optimizing the code to make sure you're in the right direction.

Seventh Step: partition function to measure memory allocation

Do what?

go test -v -bench . -run=_NONE_ -benchmem BenchmarkPartition

What have you done?

The pressure measurement will conform to the regular match "." Conditional function,-benchmen will track the average heap usage per loop. By passing parameters -run=_NONE_ , I can save some time so that the test will only run unit tests with a "NONE" string. In other words, no one unit test is run, only the full stress test is run.

Results

PASSBenchmarkPartition-8 10000000       202 ns/op      64 B/op       4 allocs/op

What does that mean?

Each cycle consumes an average of 202ns, and most importantly, each operation has 4 object allocations.

Eighth step: Delete the log statement

What do you write?

@@ -66,7 +65,6 @@ func (k *Partitioner) Partition(message *sarama.ProducerMessage, numPartitions i       }       var b [8]byte       binary.LittleEndian.PutUint64(b[:], uint64(message.Key.(*partitionPickingKey).tsid))-       logherd.Debug2(log, "key", message.Key, "numP", numPartitions, "Partitioning")       murmHash := murmur3.Sum32(b[:])       // 34026 => 66@@ -74,7 +72,6 @@ func (k *Partitioner) Partition(message *sarama.ProducerMessage, numPartitions i       setSize := uint32(1 << setBits)       shortHash := murmHash & (setSize - 1)       smallIndex := int32(shortHash) * int32(k.ringSize) / int32(setSize)-       logherd.Debug3(log, "smallIndex", smallIndex, "murmHash", murmHash, "shortHash", shortHash, "Sending to partition")       return smallIndex, nil}

What have you done?

My fix is to delete the log code. During/during the test, I added these debug code, but never deleted them. In this case, it is easiest to remove the code.

Nineth Step: Recompile evaluates whether the variable escaped to the heap

How do I do it?

go build -gcflags='-m' . 2>&1 | grep partitioner.go

Results

./partitioner.go:62: &k.ringSize escapes to heap./partitioner.go:61: leaking param: k./partitioner.go:61: (*Partitioner).Partition message does not escape./partitioner.go:67: (*Partitioner).Partition b does not escape./partitioner.go:68: (*Partitioner).Partition b does not escape

What does it mean?

You can find smallIndex , murmHash and variables do shortHash not have a message escaping to the heap.

Tenth step: Re-test the memory allocation of each operation

How do I do it?

go test -v -bench . -run=_NONE_ -benchmem BenchmarkPartition

Results

PASSBenchmarkPartition-8 30000000        40.5 ns/op       0 B/op       0 allocs/opok   github.com/signalfuse/sfxgo/ingest/bus/rawbus 1.267s

What does it mean?

Note that each operation consumes only 40ns, and more importantly, there is no memory allocation for each operation. Because I was prepared to optimize the heap, it was important to me.

Conclusion

Pprof is a very useful tool to analyze the performance issues of Go code. By combining the tools built into the go language, you can get real numbers about the changes that are caused by code changes. Unfortunately, the performance slowdown will rise over time. Next, the reader can practice, save benchmark results to the database so that you can see the code's performance after each code submission.

Original link: "Translation" to optimize the mode of go, reproduced please indicate the source!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.