This is a creation in Article, where the information may have evolved or changed.
Before writing an article "Why Signalfx metric proxy through Go language development", this article will focus on our ingest services as an example of how we optimize go code.
Singalfx is based on stream analytics and time alarm sequences, such as application metrics, which can be developed for modern applications in time series data ("How many requests did my application receive?"). "), as well as system-level metrics (" How much network traffic does my Linux server use? "). ”)。 Our user traffic is very high and granular, each user's traffic must first through our ingest service to access other SIGNALFX services.
First step: Enable Pprof
What is Pprof?
Pprof is the standard method built into the go language for debugging Go program performance. The PPROF package can be invoked via HTTP, which extracts the CPU and memory data of the application, as well as the number of lines of code and the content information that is running.
How do I enable pprof?
You can add a line of code to your app import _ "net/http/pprof"
and then launch your application server, pprof even if it's started. There is another way, when we are doing signalfx, in order to control the pprof externally, we attach some handlers, can be exposed with the routing settings, the code is as follows:
import "github.com/gorilla/mux"import "net/http/pprof"var handler *mux.Router// ...handler.PathPrefix("/debug/pprof/profile").HandlerFunc(pprof.Profile)handler.PathPrefix("/debug/pprof/heap").HandlerFunc(pprof.Heap)
Step two: Find the code that can be optimized
What do you want to do?
curl http://ingest58:6060/debug/pprof/profile > /tmp/ingest.profilego tool pprof ingest /tmp/ingest.profile(pprof) top7
What the hell is this?
The go language contains a local pprof tool to visualize the results of the output pprof. We configured the route /debug/pprof/profile
to collect 30 seconds of data. The first step of my operation above is to save the output to a local file and then run the saved file. It is worth mentioning that the last parameter can be entered directly into a URL to replace the file (translator Note: go tool pprof ingest http://ingest58:6060/debug/pprof/profile
). The command TOP7 can show the 7 most CPU-consuming functions.
Results
12910ms of 24020ms total (53.75%)Dropped 481 nodes (cum <= 120.10ms)Showing top 30 nodes out of 275 (cum >= 160ms) flat flat% sum% cum cum% 1110ms 4.62% 4.62% 2360ms 9.83% runtime.mallocgc 940ms 3.91% 8.53% 1450ms 6.04% runtime.scanobject 830ms 3.46% 11.99% 830ms 3.46% runtime.futex 800ms 3.33% 15.32% 800ms 3.33% runtime.mSpan_Sweep.func1 750ms 3.12% 18.44% 750ms 3.12% runtime.cmpbody 720ms 3.00% 21.44% 720ms 3.00% runtime.xchg 580ms 2.41% 23.86% 580ms 2.41% runtime._ExternalCode
Why is this the result?
We can see that none of these functions have been called directly. However, mallocgc
and sacnobject
mSpan_Sweep
all of this can lead to garbage collection when CPU usage is high. We can learn more about these functions than to optimize the go language garbage collector itself, and the better way to optimize it is to optimize the way we use the Go language garbage collector in our code. In this example, what we can optimize is to reduce the creation of objects on the heap.
Step three: Explore the causes of GC
Do what?
curl http://ingest58:6060/debug/pprof/heap > /tmp/heap.profilego tool pprof -alloc_objects /tmp/ingest /tmp/heap.profile(pprof) top3
What have you done?
Note that the URL for this download is somewhat similar to the previous one, but ends with/heap. This will provide us with a summary of the data used in the heap above the machine. I save it again as a comparison behind the file user. The parameter-alloc_objects will visualize the number of objects allocated by the application during execution.
Results
4964437929 of 7534904879 total (65.89%)Dropped 541 nodes (cum <= 37674524)Showing top 10 nodes out of 133 (cum >= 321426216) flat flat% sum% cum cum%853721355 11.33% 11.33% 859078341 11.40% github.com/signalfuse/sfxgo/ingest/tsidcache/tsiddiskcache.(*DiskKey).EncodeOld702927011 9.33% 20.66% 702927011 9.33% reflect.unsafe_New624715067 8.29% 28.95% 624715067 8.29% github.com/signalfuse/sfxgo/ingest/bus/rawbus.(*Partitioner).Partition
What does it mean?
As you can see, 11.33% of the object assignments occur in the object DiskKey
's function EncodeOld
, and we expect this to be the result. However, it is not expected that the partition function takes up 8.29% of the total memory allocation, because this function is just some basic calculation, I have to focus on this problem.
Fourth step: Find out why Partitioner uses so much memory
Do what?
(pprof) list Partitioner.*Partition
What have you done?
This command can print out the source code line I'm following, and what code inside the function is causing the heap's memory request. This is one of many commands within the pprof. Another very useful is to look at the caller and the callee. You can see the full help by helping and try it all.
Results
Total:11323262665routine ======================== Github.com/signalfuse/sfxgo/ingest/bus/rawbus. (*partitioner). Partition in/opt/jenkins/workspace/ingest/gopath/src/github.com/signalfuse/sfxgo/ingest/bus/rawbus/ partitioner.go927405893 927405893 (flat, cum) 8.19% of total. . 64:if Ringsize = = 0 {. . 65:return 0, Errunsetringsize. . 66:}. . 67:var b [8]byte. . 68:binary. Littleendian.putuint64 (b[:], UInt64 (message. Key. (*partitionpickingkey). Tsid)) 239971917 239971917 69:logherd. DEBUG2 (log, "key", message.) Key, "Nump", numpartitions, "partitioning"). . 70:murmhash: = Murmur3. Sum32 (b[:]). . 71:. . //34026 = 66. . 73:setbits: = UINT (16). . 74:setsize: = UInt32 (1 << setbits). . 75:shorthash: = Murmhash & (setSize-1). . 76:smallindex: = Int32 (Shorthash) * Int32 (K.ringsize)/int32 (SetSize) 687433976 687433976 77:logherd. DEBUG3 (log, "Smallindex", Smallindex, "Murmhash", Murmhash, "Shorthash", Shorthash, "sending to Partition"). . 78:return Smallindex, Nil. . 79:}. . 80:
What does it mean?
This can indicate that the debug log is the cause of the variable escaping from the stack to the heap. Since the debug log is not directly required, I can delete the rows directly. But first, let's confirm this hypothesis. The logherd.Debug2
function looks like the following, and if the log level debug does not meet the criteria, the Withfield object is not called.
// Debug2 to logger 2 key/value pairs and message. Intended to save the mem alloc that WithField createsfunc Debug2(l *logrus.Logger, key string, val interface{}, key2 string, val2 interface{}, msg string) { if l.Level >= logrus.DebugLevel { l.WithField(key, val).WithField(key2, val2).Debug(msg) }}
From Pprof detection appears to be passing an integer-to Debug2
-function-caused memory allocation, let's further confirm.
Fifth step: Find the cause of the memory allocation caused by the log statement
What to do:
go build -gcflags='-m' . 2>&1 | grep partitioner.go
What's this for?
Compiling with the-m parameter allows the compiler to print content to stderr. This includes whether the compiler can allocate memory on the stack or must put the variable on the heap to apply. If the compiler cannot determine whether a variable is being called externally, he will be put on the heap by the go language.
Results
./partitioner.go:63: &k.ringSize escapes to heap./partitioner.go:62: leaking param: k./partitioner.go:70: message.Key escapes to heap./partitioner.go:62: leaking param content: message./partitioner.go:70: numPartitions escapes to heap./partitioner.go:77: smallIndex escapes to heap./partitioner.go:77: murmHash escapes to heap./partitioner.go:77: shortHash escapes to heap./partitioner.go:68: (*Partitioner).Partition b does not escape./partitioner.go:71: (*Partitioner).Partition b does not escape
Take note of line 77th, and smallIndex
murmHash
all the shortHash
escape to the heap. The compiler has applied space on the heap for short-life variables, causing us to create a lot of objects that we don't need on the pair.
Sixth step: Pressure measurement of partition function
Write what?
func BenchmarkPartition(b *testing.B) { r := rand.New(rand.NewSource(0)) k := partitionPickingKey{} msg := sarama.ProducerMessage { Key: &k, } p := Partitioner{ ringSize: 1024, ringName: "quantizer.ring", } num_partitions := int32(1024) for i := 0; i < b.N; i++ { k.tsid = r.Int63() part, err := p.Partition(&msg, num_partitions) if err != nil { panic("Error benchmarking") } if part < 0 || part >= num_partitions { panic("Bench failure") } }}
The pressure test simply creates a B.N object and, when returned, does a basic check to make sure that the object is not simply optimized. We recommend that programmers write the code before optimizing the code to make sure you're in the right direction.
Seventh Step: partition function to measure memory allocation
Do what?
go test -v -bench . -run=_NONE_ -benchmem BenchmarkPartition
What have you done?
The pressure measurement will conform to the regular match "." Conditional function,-benchmen will track the average heap usage per loop. By passing parameters -run=_NONE_
, I can save some time so that the test will only run unit tests with a "NONE" string. In other words, no one unit test is run, only the full stress test is run.
Results
PASSBenchmarkPartition-8 10000000 202 ns/op 64 B/op 4 allocs/op
What does that mean?
Each cycle consumes an average of 202ns, and most importantly, each operation has 4 object allocations.
Eighth step: Delete the log statement
What do you write?
@@ -66,7 +65,6 @@ func (k *Partitioner) Partition(message *sarama.ProducerMessage, numPartitions i } var b [8]byte binary.LittleEndian.PutUint64(b[:], uint64(message.Key.(*partitionPickingKey).tsid))- logherd.Debug2(log, "key", message.Key, "numP", numPartitions, "Partitioning") murmHash := murmur3.Sum32(b[:]) // 34026 => 66@@ -74,7 +72,6 @@ func (k *Partitioner) Partition(message *sarama.ProducerMessage, numPartitions i setSize := uint32(1 << setBits) shortHash := murmHash & (setSize - 1) smallIndex := int32(shortHash) * int32(k.ringSize) / int32(setSize)- logherd.Debug3(log, "smallIndex", smallIndex, "murmHash", murmHash, "shortHash", shortHash, "Sending to partition") return smallIndex, nil}
What have you done?
My fix is to delete the log code. During/during the test, I added these debug code, but never deleted them. In this case, it is easiest to remove the code.
Nineth Step: Recompile evaluates whether the variable escaped to the heap
How do I do it?
go build -gcflags='-m' . 2>&1 | grep partitioner.go
Results
./partitioner.go:62: &k.ringSize escapes to heap./partitioner.go:61: leaking param: k./partitioner.go:61: (*Partitioner).Partition message does not escape./partitioner.go:67: (*Partitioner).Partition b does not escape./partitioner.go:68: (*Partitioner).Partition b does not escape
What does it mean?
You can find smallIndex
, murmHash
and variables do shortHash
not have a message escaping to the heap.
Tenth step: Re-test the memory allocation of each operation
How do I do it?
go test -v -bench . -run=_NONE_ -benchmem BenchmarkPartition
Results
PASSBenchmarkPartition-8 30000000 40.5 ns/op 0 B/op 0 allocs/opok github.com/signalfuse/sfxgo/ingest/bus/rawbus 1.267s
What does it mean?
Note that each operation consumes only 40ns, and more importantly, there is no memory allocation for each operation. Because I was prepared to optimize the heap, it was important to me.
Conclusion
Pprof is a very useful tool to analyze the performance issues of Go code. By combining the tools built into the go language, you can get real numbers about the changes that are caused by code changes. Unfortunately, the performance slowdown will rise over time. Next, the reader can practice, save benchmark results to the database so that you can see the code's performance after each code submission.
Original link: "Translation" to optimize the mode of go, reproduced please indicate the source!