Effect of Golang Gc/arch on benchmark

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Recently, in a colleague's question: when iterating over a slice, it would be better to move the loop condition out of the loop for len than the Golang compiler's optimization results.

That

func G0(a []int) int {    L := Len(a)     for I := 0; I < L; I++ {    }    return 1}

Whether it will be better than

func G1(a []int) int {     for I := 0; I < Len(a); I++ {    }    return 1}

Results are more optimized (currently the Golang compiler does not eliminate this empty loop).

That to prove the problem, then the benchmark prove ah.

Import "Testing"var a =  Make([]int, 1<< -)func BenchmarkG0(b *Testing.B) {     for I := 0; I < b.N; I++ {        G0(a)    }}func BenchmarkG1(b *Testing.B) {     for I := 0; I < b.N; I++ {        G1(a)    }}

And then execute

test-c.-test.bench=.-test.count=2

Get the output result:

goos: darwingoarch: amd64BenchmarkG0-4            100      11784627 ns/opBenchmarkG0-4            100      11841061 ns/opBenchmarkG1-4            100      18623122 ns/opBenchmarkG1-4            100      17790754 ns/opPASS

Sure g0 g1 , much faster than the speed, but this is a bit counter-common sense ah, can not be so easy to conclude. So let's see g0 if the compilation results are much better than the g1 optimizations:

Let's do it.

> main.s

The results we have come to see:

TEXT _/test/go/len.g0 (SB)/test/go/len/main.go main.go:4 0x10ef150 488b442410 movq 0x10 (SP), AX main.go:4 0x10ef155 31c9 xorl CX, CX Main.go:6 0x                  10ef157 eb03 JMP 0x10ef15c main.go:6 0x10ef159 48ffc1             INCQ CX main.go:6 0x10ef15c 4839c1 cmpq AX, CX Main.go:6 0x10ef15f 7cf8 JL 0x10ef159 main.go:8 0x10ef161 48c7442420010                   00000 movq $0x1, 0x20 (SP) Main.go:8 0x10ef16a C3 RET:-1                      0x10ef16b cc INT $0x3: -1 0x10ef16c cc                   int $0x3: -1 0x10ef16d cc int $0X3:1            0x10ef16e   CC int $0X3: -1 0x10ef16f cc int $0x3text _/ TEST/GO/LEN.G1 (SB)/test/go/len/main.go main.go:12 0x10ef170 488b442410 movq 0x10 (S                P), AX main.go:12 0x10ef175 31c9 xorl CX, CX main.go:13 0x10ef177                  EB03 JMP 0x10ef17c main.go:13 0x10ef179 48ffc1 INCQ CX main.go:13 0x10ef17c 4839c1 cmpq AX, CX main.go:13 0X10EF      17f 7cf8 JL 0x10ef179 main.go:15 0x10ef181 48c744242001000000                   Movq $0x1, 0x20 (SP) Main.go:15 0x10ef18a C3 RET:-1                      0x10ef18b cc INT $0x3: -1 0x10ef18c cc     INT $0x3:-1              0x10ef18d cc INT $0x3: -1 0x10ef18e cc int $0x3: -1 0x10ef18f cc int $0x3

We can see that compiler-generated intermediate code is exactly the same, so why do you have different results when you run it? So we're going to have to think about what's going to affect code execution in addition to the code? That is:

    • Operating Environment
    • Runtime

Then we will verify the two factors separately.

The first is the operating environment, we switch to the linux previous verification:

GOOS=GOARCH=test-c.## copy to linux-test.bench=.-test.count=2

Get the output result:

goos: linuxgoarch: amd64BenchmarkG0-32             100      10824437 ns/opBenchmarkG0-32             100      10743979 ns/opBenchmarkG1-32             100      10740347 ns/opBenchmarkG1-32             100      10898047 ns/opPASS

On linux g0 /on g1 the performance is the same. So what are we going to think about linux and darwin What's the difference? This can be more, there is no way to one by one contrast. But these differences will be largely reflected in the runtime above.

Then we'll runtime compare them. runtimewhat in that will affect the operation of the program? may have: (not listed full)

    • Extension of function stack space
    • Dispatch of Goroutine
    • Io/syscall/cgo
    • Gc

From the results of the above objdump, we can see that the generated code should have nothing to do with the first 3 factors. Then we'll try to close the GC and make a comparison:

Import (    "Runtime/debug"    "Testing")func Init() {    Debug.setgcpercent(-1)}var a =  Make([]int, 1<< -)func BenchmarkG0(b *Testing.B) {     for I := 0; I < b.N; I++ {        G0(a)    }}func BenchmarkG1(b *Testing.B) {     for I := 0; I < b.N; I++ {        G1(a)    }}

And then

go test -c ../len.test -test.bench=. -test.count=2

Get results:

goos: darwingoarch: amd64BenchmarkG0-4            100      11521770 ns/opBenchmarkG0-4            100      11310217 ns/opBenchmarkG1-4            100      11562763 ns/opBenchmarkG1-4            100      11590019 ns/opPASS

You can see that gc after closing, g0 /performing the g1 same, is it because g1 the memory is dynamically allocated during the run? The result of the above objdump, obviously not. That's only possible when the GC is running, why does the GC Pianqiao g1 start at runtime? This is more subtle, runtime the performance is related to many factors of the system, the same code on different operating systems also have subtle differences, perhaps some kind of pseudo-random factors in runtime ? Not yet.

Once again, the effect of this GC has nothing to do with the position of the benchmark function.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.