Optimizing ABS () for Int64 type in Golang

Source: Internet
Author: User
Tags benchmark subq
This is a creation in Article, where the information may have evolved or changed.


Original: Optimized abs () for int64 in Go, translation: Optimize ABS () for Int64 type in Golang, welcome reprint.



Objective



The Go language does not have a built-inabs()standard function to calculate the absolute value of an integer, where the absolute value is the nonnegative representation of a negative or positive number.



I recently implemented a function to solve the day 20 problem on the top of Advent of Code 2017abs(). If you want to learn something new or try your skill, you can go and find out.



Go has actually beenmathimplemented in the packageabs(): Math. Abs, but not for my problem, because it's input and output value types arefloat64, what I need isint64. It is possible to use parameter conversions, but there will befloat64int64some overhead, and a large number of converted values will be truncated, and the two points will be clearly stated in the article.



Posts Pure Go Math. Abs outperforms assembly version discusses how to optimize for floating-point numbersmath.Abs, but these optimizations are different from the underlying code and cannot be applied directly to an integral type.



The source code and test cases in the article are Cavaliercoder/go-abs



Method of type conversion VS Branch control



For me, the simplest function implementation of absolute value is: input parameter n is greater than or equal to 0 directly return N, less than 0 return-n (negative negation is positive), the absolute value of the function depends on the branch control structure to calculate the absolute value, is named:abs.WithBranch


package abs

func WithBranch(n int64) int64 {
    if n < 0 {
        return -n
    }
    return n
}


Successfully returns the absolute value of N, which is the implementation of Go v1.9.xmath.Absto the absolute value of float64. But when the conversion of type (Int64 to float64) is taken to the absolute value, is 1.9.x improved? We can verify that:


package abs

func WithStdLib(n int64) int64 {
    return int64(math.Abs(float64(n)))
}


In the above code, n is first converted toint64float64, andmath.Absthen reversed by taking the absolute valueint64, and multiple conversions can obviously result in a performance overhead. A benchmark test can be written to verify:


$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/cavaliercoder/abs
BenchmarkWithBranch-8           2000000000               0.30 ns/op
BenchmarkWithStdLib-8           2000000000               0.79 ns/op
PASS
ok      github.com/cavaliercoder/abs    2.320s


Test Result: 0.3 ns/op,WithBranchmore than twice faster, it also has an advantage: in the conversion of the large number of int64 to IEEE-754 standard float64 will not occur truncation (loss of value beyond precision)



For example,abs.WithBranch(-9223372036854775807)9223372036854775807 will be returned correctly. However, there isWithStdLib(-9223372036854775807)an overflow in the type conversion interval, which returns 9223372036854775808, and an incorrect negative result is returned when the large positive number is enteredWithStdLib(9223372036854775807).



Methods that do not rely on branch control to take absolute values are obviously faster and more accurate for signed integers, but is there a better way?



We all know that the code that does not rely on branching controls breaks the order in which the program runs, i.e. pipelining processors cannot predict the next step of the program.



Scenarios that differ from methods that do not rely on branch control



Hacker's Delight The second chapter introduces a method of no branching control, which computes the absolute value of a signed integer by two ' s complement.



To calculate the absolute value of x:


    1. First, the X to thex >> 63right 63 bits (get the highest bit sign bit), if you are familiar with unsigned integers, you should know if X is negative then Y is 1, no y is 0
    2. Recalculate(x ⨁ y) - y: X and y xor minus Y, which is the absolute value of x.

      You can use an efficient assembly implementation directly, with the following code:

func WithASM(n int64) int64
// abs_amd64.s
TEXT ·WithASM(SB),$0
  MOVQ    n+0(FP), AX     // copy input to AX
  MOVQ    AX, CX          // y ← x
  SARQ    $63, CX         // y ← y >> 63
  XORQ    CX, AX          // x ← x ⨁ y
  SUBQ    CX, AX          // x ← x - y
  MOVQ    AX, ret+8(FP)   // copy result to return value
  RET


We first name this function forWithASM, separate naming and implementation, the function body using GO assembly implementation, the above code only applies to the AMD64 architecture system, I recommend your filename plus_amd64.sthe suffix.



WithASMBenchmark results of the test:


$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/cavaliercoder/abs
BenchmarkWithBranch-8           2000000000               0.29 ns/op
BenchmarkWithStdLib-8           2000000000               0.78 ns/op
BenchmarkWithASM-8              2000000000               1.78 ns/op
PASS
ok      github.com/cavaliercoder/abs    6.059s


This is awkward, and this simple benchmark shows that the code with no branching control structure is very slow to run: 1.78 ns/op, how could that be?



Compilation options



We need to know how the compiler of Go optimizes the executionWithASMof functions, and the compiler takes-mparameters to print out the optimized contentgo build,go testplus the-gcflags=-muse of:



Operating effect:


$ go tool compile -m abs.go
# github.com/cavaliercoder/abs
./abs.go:11:6: can inline WithBranch
./abs.go:21:6: can inline WithStdLib
./abs.go:22:23: inlining call to math.Abs


For our simple function, the Go compiler supports function inlining, which refers to the function body of the function that is directly used to call our function instead. As an example:


package main

import (
  "fmt"
  "github.com/cavaliercoder/abs"
)

func main() {
  n := abs.WithBranch(-1)
  fmt.Println(n)
}


Will actually be compiled into:


package main

import "fmt"

func main() {
  n := -1
  if n < 0 {
      n = -n
  }
  fmt.Println(n)
}


Depending on the compiler's output, it can be seenWithBranchand inlineWithStdLibat compile time, butWithASMnot. ForWithStdLib, even though the underlying invocationmath.Absis still inline at compile time.



Because aWithASMfunction cannot be inline, each function that invokes it generates additional overhead on the call: toWithASMreallocate stack memory, copy parameters and pointers, and so on.



What if we don't use the inner alliance in other functions? You can write a simple example program:


package abs

//go:noinline
func WithBranch(n int64) int64 {
    if n < 0 {
        return -n
    }
    return n
}


Recompile, we will see fewer compiler optimizations:


$ go tool compile -m abs.goabs.go:22:23: inlining call to math.Abs


Results of the benchmark test:


$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/cavaliercoder/abs
BenchmarkWithBranch-8           1000000000               1.87 ns/op
BenchmarkWithStdLib-8           1000000000               1.94 ns/op
BenchmarkWithASM-8              2000000000               1.84 ns/op
PASS
ok      github.com/cavaliercoder/abs    8.122s


As can be seen, the average execution time of the three functions is now almost all around 1.9 ns/op.



You might think that the call cost of each function is around 1.5ns, and this overhead negates theWithBranchspeed advantage of our function.



What I learned from the above is thatWithASMperformance is better than compiler implementation of type safety, garbage collection, and function inline, although in most cases this conclusion may be wrong. Of course, there are exceptions to this, such as enhanced SIMD encryption performance, streaming media encoding, and so on.



Use only one inline function



The Go compiler cannot inline the functions implemented by the assembler, but inline our rewritten normal functions are easy:


package abs

func WithTwosComplement(n int64) int64 {
    y := n >> 63          // y ← x >> 63
    return (n ^ y) - y    // (x ⨁ y) - y
}


The compilation results show that our method is inline:


$ go tool compile -m abs.go...abs.go:26:6: can inline WithTwosComplement


But what about performance? The results show that when we enable function inline, the performanceWithBranchis similar to the following:


$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/cavaliercoder/abs
BenchmarkWithBranch-8               2000000000               0.29 ns/op
BenchmarkWithStdLib-8               2000000000               0.79 ns/op
BenchmarkWithTwosComplement-8       2000000000               0.29 ns/op
BenchmarkWithASM-8                  2000000000               1.83 ns/op
PASS
ok      github.com/cavaliercoder/abs    6.777s


The cost of the function call is now gone, andWithTwosComplementthe implementation isWithASMmuch better than the implementation. Let's see what the compilerWithASMdid at compile time.



Use-Sparameters to tell the compiler to print out the assembly process:


$ go tool compile -S abs.go
...
"".WithTwosComplement STEXT nosplit size=24 args=0x10 locals=0x0
        0x0000 00000 (abs.go:26)        TEXT    "".WithTwosComplement(SB), NOSPLIT, $0-16
        0x0000 00000 (abs.go:26)        FUNCDATA        $0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB)
        0x0000 00000 (abs.go:26)        FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (abs.go:26)        MOVQ    "".n+8(SP), AX
        0x0005 00005 (abs.go:26)        MOVQ    AX, CX
        0x0008 00008 (abs.go:27)        SARQ    $63, AX
        0x000c 00012 (abs.go:28)        XORQ    AX, CX
        0x000f 00015 (abs.go:28)        SUBQ    AX, CX
        0x0012 00018 (abs.go:28)        MOVQ    CX, "".~r1+16(SP)
        0x0017 00023 (abs.go:28)        RET
...


When the compiler is compilingWithASMandWithTwosComplementdoing things so much like, the compiler has the right configuration and cross-platform benefits at this point, plus theGOARCH=386option to compile the program again to generate a compatible 32-bit system.



Finally, on the memory allocation, the implementation of all the above functions is the ideal situation, I rungo test -bench=. -benchme, observe the output of each function, the display does not occur memory allocation.



Summarize



WithTwosComplementImplementations provide better portability in Go, while implementing inline, branch-free code, 0 memory allocations, and value truncation to avoid type conversions. Benchmarking does not show the advantage of branching control over branching, but in theory, code with no branching control performs better in many cases.



Finally, my ABS implementation of Int64 is as follows:


func abs(n int64) int64 {    y := n >> 63    return (n ^ y) - y}


Via:optimized ABS () for Int64 in Go



Author: Ryan Armstrong
Translator: Wuyinbest
Proofreading: Rxcai


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.