Data talk: Switch and map performance measurement in Go language

Last Update:2017-09-26 Source: Internet

Author: User

Tags benchmark

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a creation in Article, where the information may have evolved or changed.

Translate the original link reprint/reprint please indicate the source

Original link @hashrocket.com published on 2015/12/28

When developing PGX (a PostgreSQL driver for the Go language), I had to jump through more than 20 code branches several times. Usually I choose the switch statement. There is also a more readable way to implement this method is to use a function map. I started by thinking that branching jumps with a switch statement is faster than a map lookup and a function call. Database driver performance is an important consideration, so before making any changes, it is necessary to take a careful look at their impact.

Summary

Performance tests show that they differ greatly. But the final answer is that they may be irrelevant to the entire program. If you want to know the test for this conclusion, please read on.

Preliminary investigation

No useful information was found on the Internet. Several of the posts I've found suggest that maps will be faster when there are enough jump branches. A discussion of switch optimization in 2012 included Ken Thompson's point of view. He thought there was not much room for optimization. I decided to write a benchmark to test their performance in the Go language.

The most basic test

The system configuration for the following results is: Intel I7-4790k,ubuntu 14.04, which is running go1.5.1 linux/amd64. Test the source code and the results on GitHub.

The following is a basic test of switch:

func BenchmarkSwitch(b *testing.B) {  var n int  for i := 0; i < b.N; i++ {    switch i % 4 {    case 0:      n += f0(i)    case 1:      n += f1(i)    case 2:      n += f2(i)    case 3:      n += f3(i)    }  }  // n will never be < 0, but checking n should ensure that the entire benchmark loop can't be optimized away.  if n < 0 {    b.Fatal("can't happen")  }

It is well known that testing like this is often difficult to achieve. For example, the compiler optimizer ignores a piece of code that does not produce any effect at all. The n here is used to prevent the entire code from being optimized. There are a few other points to note in the next article.

Here is the test code for a function map:

func BenchmarkMapFunc4(b *testing.B) {  var n int  for i := 0; i < b.N; i++ {    n += Funcs[i%4](i)  }  // n will never be < 0, but checking n should ensure that the entire benchmark loop can't be optimized away.  if n < 0 {    b.Fatal("can't happen")  }}

We use the Ruby Erb template to generate tests that contain 4,8,16,32,64,128,256 and 512 jump branches. The results show that the map version is 25% slower than the switch version in the case of 4 branches. In the case of 8 branches they are comparable in performance. The larger the map version is, the faster it will be, and in the 512-branch test it would be 50% faster than the switch version.

inline function (inlineable Functions)

The previous tests gave some results, but they were not sufficient. There are several factors that affect testing that are not taken into account. The first is whether the function is inline. A function can be inline in a switch statement, but the function map does not. It is necessary to test the effect of function inline on performance.

The following function does some meaningless work, which guarantees that the entire function content will not be optimized, but the go language compiler will inline the entire function.

func f0(n int) int {  if n%2 == 0 {    return n  } else {    return 0  }}

In writing this article, the go compiler cannot inline a function that contains panic. The following function contains a panic call that cannot be executed, thus preventing the function from being inline.

func noInline0(n int) int {  if n < 0 {    panic("can't happen - but should ensure this function is not inlined")  } else if n%2 == 0 {    return n  } else {    return 0  }}

When a function cannot be inline, there is a big change in performance. The map version of the code is about 30% faster than the switch version in the 4-branch test, 300% faster in the 512-branch test.

Calculate a jump destination or find a jump destination

The above test determines the jump branch based on the number of cycles.

  for i := 0; i < b.N; i++ {    switch i % 4 {      // ...    }  }

This guarantees that we are only testing the performance of branch jumps. In the real world, the choice of a branch jump usually results in a memory read. To simulate this behavior, we use a simple lookup to determine the jump branch.

var ascInputs []intfunc TestMain(m *testing.M) {  for i := 0; i < 4096; i++ {    ascInputs = append(ascInputs, i)  }  os.Exit(m.Run())}func BenchmarkSwitch(b *testing.B) {  // ...  for i := 0; i < b.N; i++ {    switch ascInputs[i%len(ascInputs)] % 4 {    // ...  }  // ...}

This change greatly reduces performance. The best-performing switch test performance dropped from 1.99 ns/op to 8.18 ns/op. The best-performing map test performance dropped from 2.39 ns/op down to 10.6 ns/op. The specific data will differ in different tests, but the find operation has increased by about 7 ns/op.

Unpredictable Branch Jump Order

The powerful reader must have noticed that the branch jumps in these tests are highly predictable, which is not realistic. It is always in the order of branching 0, then branching 1, and then branching 2. To solve this problem, the selection of branch jumps is randomized.

var randInputs []intfunc TestMain(m *testing.M) {  for i := 0; i < 4096; i++ {    randInputs = append(randInputs, rand.Int())  }  os.Exit(m.Run())}func BenchmarkSwitch(b *testing.B) {  // ...  for i := 0; i < b.N; i++ {    switch randInputs[i%len(ascInputs)] % 4 {    // ...  }  // ...}

This change continues to degrade performance. In the 32 Jump branch test, the map lookup delay rose from 11ns to 22ns. The specific data depends on the number of jump branches and whether the function is changed by the Federation, but the performance is almost halved.

Further

The performance loss from calculating a jump to a branch destination to finding a jump destination is expected because of the additional memory read. But the performance impact from sequential jumps to random jumps is unexpected. To understand why, we use the Linux perf tool. It can provide CPU-level statistics such as cache miss and branch jump prediction errors (Branch-prediction misses).

To avoid the profiling of the test program compilation process, you can pre-compile the test code.

go test -c

We then let the perf tool provide us with one of the sequential search statistics for the test.

$ perf stat -e cache-references,cache-misses,branches,branch-misses ./go_map_vs_switch.test -test.bench=PredictableLookupMapNoInlineFunc512 -test.benchtime=5s

The interesting part of the output is the statistics of branch jump prediction errors:

9,919,244,177      branches10,675,162         branch-misses  # 0.11% of all branches

So the branch jump prediction works very well when the jump order is predictable. However, the results in the unpredictable branch jump test are very different.

$ perf stat -e cache-references,cache-misses,branches,branch-misses ./go_map_vs_switch.test -test.bench=UnpredictableLookupMapNoInlineFunc512 -test.benchtime=5s

Related output:

3,618,549,427 branches  451,154,480 branch-misses  # 12.47% of all branches

The error rate for branch jump forecasts has risen 100 times times.

Conclusion

The first question I wanted to answer was whether using a function map to replace a switch statement would have an impact on performance. I assume the switch statement will be a little faster. But I was wrong. Maps are usually faster and several times faster.

Does this mean that we should choose to use a map instead of a switch statement? No! Although the changes are very large in percentage terms, the absolute time difference is actually very small. This difference will only manifest when there is no actual work done on each of the millions of branch jumps in each second. Even so, the success rate of memory access and branch jumps has a greater impact on performance than choosing a switch statement or map.

The criteria for choosing a switch statement or map should be who can produce the cleanest code, not performance.

If you like our article, please about the public number "hotel Manthos Blue"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More