This is a creation in Article, where the information may have evolved or changed.
Original: Optimized abs () for int64 in Go, translation: Optimize ABS () for Int64 type in Golang, welcome reprint.
Objective
The Go language does not have a built-inabs()standard function to calculate the absolute value of an integer, where the absolute value is the nonnegative representation of a negative or positive number.
I recently implemented a function to solve the day 20 problem on the top of Advent of Code 2017abs(). If you want to learn something new or try your skill, you can go and find out.
Go has actually beenmathimplemented in the packageabs(): Math. Abs, but not for my problem, because it's input and output value types arefloat64, what I need isint64. It is possible to use parameter conversions, but there will befloat64int64some overhead, and a large number of converted values will be truncated, and the two points will be clearly stated in the article.
Posts Pure Go Math. Abs outperforms assembly version discusses how to optimize for floating-point numbersmath.Abs, but these optimizations are different from the underlying code and cannot be applied directly to an integral type.
The source code and test cases in the article are Cavaliercoder/go-abs
Method of type conversion VS Branch control
For me, the simplest function implementation of absolute value is: input parameter n is greater than or equal to 0 directly return N, less than 0 return-n (negative negation is positive), the absolute value of the function depends on the branch control structure to calculate the absolute value, is named:abs.WithBranch
package abs
func WithBranch(n int64) int64 {
if n < 0 {
return -n
}
return n
}
Successfully returns the absolute value of N, which is the implementation of Go v1.9.xmath.Absto the absolute value of float64. But when the conversion of type (Int64 to float64) is taken to the absolute value, is 1.9.x improved? We can verify that:
package abs
func WithStdLib(n int64) int64 {
return int64(math.Abs(float64(n)))
}
In the above code, n is first converted toint64float64, andmath.Absthen reversed by taking the absolute valueint64, and multiple conversions can obviously result in a performance overhead. A benchmark test can be written to verify:
$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/cavaliercoder/abs
BenchmarkWithBranch-8 2000000000 0.30 ns/op
BenchmarkWithStdLib-8 2000000000 0.79 ns/op
PASS
ok github.com/cavaliercoder/abs 2.320s
Test Result: 0.3 ns/op,WithBranchmore than twice faster, it also has an advantage: in the conversion of the large number of int64 to IEEE-754 standard float64 will not occur truncation (loss of value beyond precision)
For example,abs.WithBranch(-9223372036854775807)9223372036854775807 will be returned correctly. However, there isWithStdLib(-9223372036854775807)an overflow in the type conversion interval, which returns 9223372036854775808, and an incorrect negative result is returned when the large positive number is enteredWithStdLib(9223372036854775807).
Methods that do not rely on branch control to take absolute values are obviously faster and more accurate for signed integers, but is there a better way?
We all know that the code that does not rely on branching controls breaks the order in which the program runs, i.e. pipelining processors cannot predict the next step of the program.
Scenarios that differ from methods that do not rely on branch control
Hacker's Delight The second chapter introduces a method of no branching control, which computes the absolute value of a signed integer by two ' s complement.
To calculate the absolute value of x:
- First, the X to thex >> 63right 63 bits (get the highest bit sign bit), if you are familiar with unsigned integers, you should know if X is negative then Y is 1, no y is 0
- Recalculate(x ⨁ y) - y: X and y xor minus Y, which is the absolute value of x.
You can use an efficient assembly implementation directly, with the following code:
func WithASM(n int64) int64
// abs_amd64.s
TEXT ·WithASM(SB),$0
MOVQ n+0(FP), AX // copy input to AX
MOVQ AX, CX // y ← x
SARQ $63, CX // y ← y >> 63
XORQ CX, AX // x ← x ⨁ y
SUBQ CX, AX // x ← x - y
MOVQ AX, ret+8(FP) // copy result to return value
RET
We first name this function forWithASM, separate naming and implementation, the function body using GO assembly implementation, the above code only applies to the AMD64 architecture system, I recommend your filename plus_amd64.sthe suffix.
WithASMBenchmark results of the test:
$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/cavaliercoder/abs
BenchmarkWithBranch-8 2000000000 0.29 ns/op
BenchmarkWithStdLib-8 2000000000 0.78 ns/op
BenchmarkWithASM-8 2000000000 1.78 ns/op
PASS
ok github.com/cavaliercoder/abs 6.059s
This is awkward, and this simple benchmark shows that the code with no branching control structure is very slow to run: 1.78 ns/op, how could that be?
Compilation options
We need to know how the compiler of Go optimizes the executionWithASMof functions, and the compiler takes-mparameters to print out the optimized contentgo build,go testplus the-gcflags=-muse of:
Operating effect:
$ go tool compile -m abs.go
# github.com/cavaliercoder/abs
./abs.go:11:6: can inline WithBranch
./abs.go:21:6: can inline WithStdLib
./abs.go:22:23: inlining call to math.Abs
For our simple function, the Go compiler supports function inlining, which refers to the function body of the function that is directly used to call our function instead. As an example:
package main
import (
"fmt"
"github.com/cavaliercoder/abs"
)
func main() {
n := abs.WithBranch(-1)
fmt.Println(n)
}
Will actually be compiled into:
package main
import "fmt"
func main() {
n := -1
if n < 0 {
n = -n
}
fmt.Println(n)
}
Depending on the compiler's output, it can be seenWithBranchand inlineWithStdLibat compile time, butWithASMnot. ForWithStdLib, even though the underlying invocationmath.Absis still inline at compile time.
Because aWithASMfunction cannot be inline, each function that invokes it generates additional overhead on the call: toWithASMreallocate stack memory, copy parameters and pointers, and so on.
What if we don't use the inner alliance in other functions? You can write a simple example program:
package abs
//go:noinline
func WithBranch(n int64) int64 {
if n < 0 {
return -n
}
return n
}
Recompile, we will see fewer compiler optimizations:
$ go tool compile -m abs.goabs.go:22:23: inlining call to math.Abs
Results of the benchmark test:
$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/cavaliercoder/abs
BenchmarkWithBranch-8 1000000000 1.87 ns/op
BenchmarkWithStdLib-8 1000000000 1.94 ns/op
BenchmarkWithASM-8 2000000000 1.84 ns/op
PASS
ok github.com/cavaliercoder/abs 8.122s
As can be seen, the average execution time of the three functions is now almost all around 1.9 ns/op.
You might think that the call cost of each function is around 1.5ns, and this overhead negates theWithBranchspeed advantage of our function.
What I learned from the above is thatWithASMperformance is better than compiler implementation of type safety, garbage collection, and function inline, although in most cases this conclusion may be wrong. Of course, there are exceptions to this, such as enhanced SIMD encryption performance, streaming media encoding, and so on.
Use only one inline function
The Go compiler cannot inline the functions implemented by the assembler, but inline our rewritten normal functions are easy:
package abs
func WithTwosComplement(n int64) int64 {
y := n >> 63 // y ← x >> 63
return (n ^ y) - y // (x ⨁ y) - y
}
The compilation results show that our method is inline:
$ go tool compile -m abs.go...abs.go:26:6: can inline WithTwosComplement
But what about performance? The results show that when we enable function inline, the performanceWithBranchis similar to the following:
$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/cavaliercoder/abs
BenchmarkWithBranch-8 2000000000 0.29 ns/op
BenchmarkWithStdLib-8 2000000000 0.79 ns/op
BenchmarkWithTwosComplement-8 2000000000 0.29 ns/op
BenchmarkWithASM-8 2000000000 1.83 ns/op
PASS
ok github.com/cavaliercoder/abs 6.777s
The cost of the function call is now gone, andWithTwosComplementthe implementation isWithASMmuch better than the implementation. Let's see what the compilerWithASMdid at compile time.
Use-Sparameters to tell the compiler to print out the assembly process:
$ go tool compile -S abs.go
...
"".WithTwosComplement STEXT nosplit size=24 args=0x10 locals=0x0
0x0000 00000 (abs.go:26) TEXT "".WithTwosComplement(SB), NOSPLIT, $0-16
0x0000 00000 (abs.go:26) FUNCDATA $0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB)
0x0000 00000 (abs.go:26) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (abs.go:26) MOVQ "".n+8(SP), AX
0x0005 00005 (abs.go:26) MOVQ AX, CX
0x0008 00008 (abs.go:27) SARQ $63, AX
0x000c 00012 (abs.go:28) XORQ AX, CX
0x000f 00015 (abs.go:28) SUBQ AX, CX
0x0012 00018 (abs.go:28) MOVQ CX, "".~r1+16(SP)
0x0017 00023 (abs.go:28) RET
...
When the compiler is compilingWithASMandWithTwosComplementdoing things so much like, the compiler has the right configuration and cross-platform benefits at this point, plus theGOARCH=386option to compile the program again to generate a compatible 32-bit system.
Finally, on the memory allocation, the implementation of all the above functions is the ideal situation, I rungo test -bench=. -benchme, observe the output of each function, the display does not occur memory allocation.
Summarize
WithTwosComplementImplementations provide better portability in Go, while implementing inline, branch-free code, 0 memory allocations, and value truncation to avoid type conversions. Benchmarking does not show the advantage of branching control over branching, but in theory, code with no branching control performs better in many cases.
Finally, my ABS implementation of Int64 is as follows:
func abs(n int64) int64 { y := n >> 63 return (n ^ y) - y}
Via:optimized ABS () for Int64 in Go
Author: Ryan Armstrong
Translator: Wuyinbest
Proofreading: Rxcai