Optimizing ABS () for Int64 type in Golang

Source: Internet
Author: User
Tags benchmark truncated subq
The Go language does not have a built-in abs() standard function to calculate the absolute value of an integer. The absolute value here refers to a non-negative representation of a negative number and a positive number.

I recently implemented an abs() function myself to solve the Day 20 puzzle on Advent of Code 2017. If you want to learn something new or try your hand, you can check it out.

Go actually implemented abs() : math.Abs in the math package, but it doesn't work for my problem, because its input and output value types are float64, and I need int64. It is possible to use parameter conversion, but converting float64 to int64 will incur some overhead, and the number with a large conversion value will be truncated, both of which will be clear in the article.

Post Pure Go math.Abs outperforms assembly version discusses how to optimize math.Abs for floating point numbers, but these optimization methods are not directly applicable to integers because of the underlying encoding.

The source code and test cases in the article are in cavaliercoder/go-abs

Type conversion VS branch control method
For me, the simplest function implementation of absolute value is: input parameter n is greater than or equal to 0 and returns n directly. If it is less than zero, it returns -n (negative number is negative). This function with absolute value depends on the branch control structure to calculate Absolute value, named: abs.WithBranch

Package abs

Func WithBranch(n int64) int64 {
    If n < 0 {
        Return -n
    }
    Return n
}
Successfully returns the absolute value of n, which is the implementation of Go v1.9.x math.Abs to absolute value of float64. But does 1.9.x improve when doing type conversion (int64 to float64) and taking absolute values? We can verify it:

Package abs

Func WithStdLib(n int64) int64 {
    Return int64(math.Abs(float64(n)))
}
In the above code, n is first converted from int64 to float64, and the absolute value is taken by math.Abs and then back to int64. Multiple conversions obviously cause performance overhead. You can write a benchmark to verify it:

$ go test -bench=.
Goos: darwin
Goarch: amd64
Pkg: github.com/cavaliercoder/abs
BenchmarkWithBranch-8 2000000000 0.30 ns/op
BenchmarkWithStdLib-8 2000000000 0.79 ns/op
PASS
Ok github.com/cavaliercoder/abs 2.320s
Test result: 0.3 ns/op, WithBranch is more than twice as fast, it has an advantage: it will not be truncated when converting large numbers of int64 to IEEE64-compliant float64 (loss of value beyond precision)

For example: abs.WithBranch(-9223372036854775807) will correctly return 9223372036854775807. However, WithStdLib (-9223372036854775807) overflows in the type conversion interval and returns -9223372036854775808. With large positive input, WithStdLib(9223372036854775807) also returns an incorrect negative result.

Methods that do not rely on branch control to take absolute values are obviously faster and more accurate for signed integers, but is there a better way?

We all know that code that does not rely on branch control methods breaks the order in which programs are run, ie pipelining processors cannot predict the next action of the program.

Different from the method that does not depend on branch control
Hacker’s Delight Chapter 2 introduces a method of branchless control that calculates the absolute value of a signed integer using Two’s Complement.

To calculate the absolute value of x, first calculate x >> 63, that is, x shifts 63 bits to the right (obtains the most significant sign bit). If you are familiar with unsigned integers, you should know that if x is negative, then y is 1, otherwise y is 0

Then calculate (x ⨁ y) - y : x and y are XOR and then y, which is the absolute value of x.

You can use an efficient assembly implementation directly, the code is as follows:

Func WithASM(n int64) int64
// abs_amd64.s
TEXT ·WithASM(SB), $0
    MOVQ n+0(FP), AX // copy input to AX
    MOVQ AX, CX // y ← x
    SARQ $63, CX // y ← y >> 63
    XORQ CX, AX // x ← x ⨁ y
    SUBQ CX, AX // x ← x - y
    MOVQ AX, ret+8(FP) // copy result to return value
    RET
We first named this function as WithASM, separate the naming and implementation, and the function body uses the assembly implementation of Go. The above code is only applicable to the AMD64 architecture system. I suggest that you add the samade of _amd64.s to the file name.

WithASM benchmark results:

$ go test -bench=.
Goos: darwin
Goarch: amd64
Pkg: github.com/cavaliercoder/abs
BenchmarkWithBranch-8 2000000000 0.29 ns/op
BenchmarkWithStdLib-8 2000000000 0.78 ns/op
BenchmarkWithASM-8 2000000000 1.78 ns/op
PASS
Ok github.com/cavaliercoder/abs 6.059s
This is awkward. This simple benchmark shows that the code with no branch control structure is very simple and runs very slowly: 1.78 ns/op. How can this be?

Compile option
We need to know how Go's compiler optimizes the execution of the WithASM function. The compiler accepts the -m parameter to print the optimized content. Add -gcflags=-m to go build or go test.

running result:

$ go tool compile -m abs.go
# github.com/cavaliercoder/abs
./abs.go:11:6: can inline WithBranch
./abs.go:21:6: can inline WithStdLib
./abs.go:22:23: inlining call to math.Abs
For our simple function, Go's compiler supports function inlining. Function inlining refers to the function body that uses this function directly instead of calling our function. for example:

Package main

Import (
    "fmt"
    "github.com/cavaliercoder/abs"
)

Func main() {
    n := abs.WithBranch(-1)
    fmt.Println(n)
}
It will actually be compiled into:

Package main

Import "fmt"

Func main() {
    n := -1
    If n < 0 {
        n = -n
    }
    fmt.Println(n)
}
According to the output of the compiler, it can be seen that WithBranch and WithStdLib are inlined at compile time, but WithASM does not. For WithStdLib, even if the underlying call to math.Abs is still inlined at compile time.

Because the WithASM function can't be inlined, each function that calls it will incur extra overhead on the call: reallocating stack memory, copying parameters, pointers, and so on for WithASM.

What if we don't use inline in other functions? You can write a simple sample program:

Package abs

//go:noinline
Func WithBranch(n int64) int64 {
    If n < 0 {
        Return -n
    }
    Return n
}
Recompile, we will see that the compiler optimization content is less:

$ go tool compile -m abs.go
Abs.go:22:23: inlining call to math.Abs
Benchmark results:

$ go test -bench=.
Goos: darwin
Goarch: amd64
Pkg: github.com/cavaliercoder/abs
BenchmarkWithBranch-8 1000000000 1.87 ns/op
BenchmarkWithStdLib-8 1000000000 1.94 ns/op
BenchmarkWithASM-8 2000000000 1.84 ns/op
PASS
Ok github.com/cavaliercoder/abs 8.122s
It can be seen that the average execution time of the three functions is now almost 1.9 ns/op.

You might think that the call overhead for each function is around 1.5ns, and the appearance of this overhead negates the speed advantage of our WithBranch function.

What I learned from above is that WithASM performs better than compilers in terms of type safety, garbage collection, and function inlining, although in most cases this conclusion may be wrong. Of course, there are exceptions to this, such as improving the encryption performance of SIMD, streaming media encoding, and so on.

Use only one inline function
The Go compiler can't inline functions implemented by assembly, but it's easy to inline our rewritten ordinary functions:

Package abs

Func WithTwosComplement(n int64) int64 {
    y := n >> 63 // y ← x >> 63
    Return (n ^ y) - y // (x ⨁ y) - y
}
The compilation results show that our method is inlined:

$ go tool compile -m abs.go
...
Abs.go:26:6: can inline WithTwosComplement
But what about performance? The results show that when we enable function inlining, the performance is very similar to WithBranch:

$ go test -bench=.
Goos: darwin
Goarch: amd64
Pkg: github.com/cavaliercoder/abs
BenchmarkWithBranch-8 2000000000 0.29 ns/op
BenchmarkWithStdLib-8 2000000000 0.79 ns/op
BenchmarkWithTwosComplement-8 2000000000 0.29 ns/op
BenchmarkWithASM-8 2000000000 1.83 ns/op
PASS

Ok github.com/c

Avaliercoder/abs 6.777s
Now that the overhead of the function call disappears, the implementation of WithTwosComplement is much better than the implementation of WithASM. Let's see what the compiler did when compiling WithASM?

Use the -S parameter to tell the compiler to print out the assembly process:

$ go tool compile -S abs.go
...
"".WithTwosComplement STEXT nosplit size=24 args=0x10 locals=0x0
                0x0000 00000 (abs.go:26) TEXT "".WithTwosComplement(SB), NOSPLIT, $0-16
                0x0000 00000 (abs.go:26) FUNCDATA $0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB)
                0x0000 00000 (abs.go:26) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
                0x0000 00000 (abs.go:26) MOVQ "".n+8(SP), AX
                0x0005 00005 (abs.go:26) MOVQ AX, CX
                0x0008 00008 (abs.go:27) SARQ $63, AX
                0x000c 00012 (abs.go:28) XORQ AX, CX
                0x000f 00015 (abs.go:28) SUBQ AX, CX
                0x0012 00018 (abs.go:28) MOVQ CX, "".~r1+16(SP)
                0x0017 00023 (abs.go:28) RET
...
When the compiler compiles WithASM and WithTwosComplement, it does something too much. The compiler has the right configuration and cross-platform advantages at this time. It can be compiled with the GOARCH=386 option to generate programs compatible with 32-bit systems.

Finally, regarding memory allocation, the implementation of all the above functions is ideal. I run go test -bench=. -benchme to observe the output of each function, and the memory allocation does not occur.

to sum up
The implementation of WithTwosComplement provides better portability in Go, while implementing function inline, no branch control code, zero memory allocation, and value truncation caused by avoiding type conversion. Benchmarking does not show the advantage of no branch control over branch control, but in theory, code without branch control will perform better in a variety of situations.

Finally, my implementation of ins64's abs is as follows:

Func abs(n int64) int64 {
    y := n >> 63
    Return (n ^ y) - y
}

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.