International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Go

Optimizing ABS () for Int64 type in Golang

Last Update:2018-02-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a creation in Article, where the information may have evolved or changed.

Original: Optimized abs () for int64 in Go, translation: Optimize ABS () for Int64 type in Golang, welcome reprint.

Objective

The Go language does not have a built-inabs()standard function to calculate the absolute value of an integer, where the absolute value is the nonnegative representation of a negative or positive number.

I recently implemented a function to solve the day 20 problem on the top of Advent of Code 2017abs(). If you want to learn something new or try your skill, you can go and find out.

Go has actually beenmathimplemented in the packageabs(): Math. Abs, but not for my problem, because it's input and output value types arefloat64, what I need isint64. It is possible to use parameter conversions, but there will befloat64int64some overhead, and a large number of converted values will be truncated, and the two points will be clearly stated in the article.

Posts Pure Go Math. Abs outperforms assembly version discusses how to optimize for floating-point numbersmath.Abs, but these optimizations are different from the underlying code and cannot be applied directly to an integral type.

The source code and test cases in the article are Cavaliercoder/go-abs

Method of type conversion VS Branch control

For me, the simplest function implementation of absolute value is: input parameter n is greater than or equal to 0 directly return N, less than 0 return-n (negative negation is positive), the absolute value of the function depends on the branch control structure to calculate the absolute value, is named:abs.WithBranch

package abs

func WithBranch(n int64) int64 {
    if n < 0 {
        return -n
    }
    return n
}

Successfully returns the absolute value of N, which is the implementation of Go v1.9.xmath.Absto the absolute value of float64. But when the conversion of type (Int64 to float64) is taken to the absolute value, is 1.9.x improved? We can verify that:

package abs

func WithStdLib(n int64) int64 {
    return int64(math.Abs(float64(n)))
}

In the above code, n is first converted toint64float64, andmath.Absthen reversed by taking the absolute valueint64, and multiple conversions can obviously result in a performance overhead. A benchmark test can be written to verify:

$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/cavaliercoder/abs
BenchmarkWithBranch-8           2000000000               0.30 ns/op
BenchmarkWithStdLib-8           2000000000               0.79 ns/op
PASS
ok      github.com/cavaliercoder/abs    2.320s

Test Result: 0.3 ns/op,WithBranchmore than twice faster, it also has an advantage: in the conversion of the large number of int64 to IEEE-754 standard float64 will not occur truncation (loss of value beyond precision)

For example,abs.WithBranch(-9223372036854775807)9223372036854775807 will be returned correctly. However, there isWithStdLib(-9223372036854775807)an overflow in the type conversion interval, which returns 9223372036854775808, and an incorrect negative result is returned when the large positive number is enteredWithStdLib(9223372036854775807).

Methods that do not rely on branch control to take absolute values are obviously faster and more accurate for signed integers, but is there a better way?

We all know that the code that does not rely on branching controls breaks the order in which the program runs, i.e. pipelining processors cannot predict the next step of the program.

Scenarios that differ from methods that do not rely on branch control

Hacker's Delight The second chapter introduces a method of no branching control, which computes the absolute value of a signed integer by two ' s complement.

To calculate the absolute value of x:

First, the X to thex >> 63right 63 bits (get the highest bit sign bit), if you are familiar with unsigned integers, you should know if X is negative then Y is 1, no y is 0
Recalculate(x ⨁ y) - y: X and y xor minus Y, which is the absolute value of x.
You can use an efficient assembly implementation directly, with the following code:

func WithASM(n int64) int64

// abs_amd64.s
TEXT ·WithASM(SB),$0
  MOVQ    n+0(FP), AX     // copy input to AX
  MOVQ    AX, CX          // y ← x
  SARQ    $63, CX         // y ← y >> 63
  XORQ    CX, AX          // x ← x ⨁ y
  SUBQ    CX, AX          // x ← x - y
  MOVQ    AX, ret+8(FP)   // copy result to return value
  RET

We first name this function forWithASM, separate naming and implementation, the function body using GO assembly implementation, the above code only applies to the AMD64 architecture system, I recommend your filename plus_amd64.sthe suffix.

WithASMBenchmark results of the test:

$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/cavaliercoder/abs
BenchmarkWithBranch-8           2000000000               0.29 ns/op
BenchmarkWithStdLib-8           2000000000               0.78 ns/op
BenchmarkWithASM-8              2000000000               1.78 ns/op
PASS
ok      github.com/cavaliercoder/abs    6.059s

This is awkward, and this simple benchmark shows that the code with no branching control structure is very slow to run: 1.78 ns/op, how could that be?

Compilation options

We need to know how the compiler of Go optimizes the executionWithASMof functions, and the compiler takes-mparameters to print out the optimized contentgo build,go testplus the-gcflags=-muse of:

Operating effect:

$ go tool compile -m abs.go
# github.com/cavaliercoder/abs
./abs.go:11:6: can inline WithBranch
./abs.go:21:6: can inline WithStdLib
./abs.go:22:23: inlining call to math.Abs

For our simple function, the Go compiler supports function inlining, which refers to the function body of the function that is directly used to call our function instead. As an example:

package main

import (
  "fmt"
  "github.com/cavaliercoder/abs"
)

func main() {
  n := abs.WithBranch(-1)
  fmt.Println(n)
}

Will actually be compiled into:

package main

import "fmt"

func main() {
  n := -1
  if n < 0 {
      n = -n
  }
  fmt.Println(n)
}

Depending on the compiler's output, it can be seenWithBranchand inlineWithStdLibat compile time, butWithASMnot. ForWithStdLib, even though the underlying invocationmath.Absis still inline at compile time.

Because aWithASMfunction cannot be inline, each function that invokes it generates additional overhead on the call: toWithASMreallocate stack memory, copy parameters and pointers, and so on.

What if we don't use the inner alliance in other functions? You can write a simple example program:

package abs

//go:noinline
func WithBranch(n int64) int64 {
    if n < 0 {
        return -n
    }
    return n
}

Recompile, we will see fewer compiler optimizations:

$ go tool compile -m abs.goabs.go:22:23: inlining call to math.Abs

Results of the benchmark test:

$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/cavaliercoder/abs
BenchmarkWithBranch-8           1000000000               1.87 ns/op
BenchmarkWithStdLib-8           1000000000               1.94 ns/op
BenchmarkWithASM-8              2000000000               1.84 ns/op
PASS
ok      github.com/cavaliercoder/abs    8.122s

As can be seen, the average execution time of the three functions is now almost all around 1.9 ns/op.

You might think that the call cost of each function is around 1.5ns, and this overhead negates theWithBranchspeed advantage of our function.

What I learned from the above is thatWithASMperformance is better than compiler implementation of type safety, garbage collection, and function inline, although in most cases this conclusion may be wrong. Of course, there are exceptions to this, such as enhanced SIMD encryption performance, streaming media encoding, and so on.

Use only one inline function

The Go compiler cannot inline the functions implemented by the assembler, but inline our rewritten normal functions are easy:

package abs

func WithTwosComplement(n int64) int64 {
    y := n >> 63          // y ← x >> 63
    return (n ^ y) - y    // (x ⨁ y) - y
}

The compilation results show that our method is inline:

$ go tool compile -m abs.go...abs.go:26:6: can inline WithTwosComplement

But what about performance? The results show that when we enable function inline, the performanceWithBranchis similar to the following:

$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/cavaliercoder/abs
BenchmarkWithBranch-8               2000000000               0.29 ns/op
BenchmarkWithStdLib-8               2000000000               0.79 ns/op
BenchmarkWithTwosComplement-8       2000000000               0.29 ns/op
BenchmarkWithASM-8                  2000000000               1.83 ns/op
PASS
ok      github.com/cavaliercoder/abs    6.777s

The cost of the function call is now gone, andWithTwosComplementthe implementation isWithASMmuch better than the implementation. Let's see what the compilerWithASMdid at compile time.

Use-Sparameters to tell the compiler to print out the assembly process:

$ go tool compile -S abs.go
...
"".WithTwosComplement STEXT nosplit size=24 args=0x10 locals=0x0
        0x0000 00000 (abs.go:26)        TEXT    "".WithTwosComplement(SB), NOSPLIT, $0-16
        0x0000 00000 (abs.go:26)        FUNCDATA        $0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB)
        0x0000 00000 (abs.go:26)        FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (abs.go:26)        MOVQ    "".n+8(SP), AX
        0x0005 00005 (abs.go:26)        MOVQ    AX, CX
        0x0008 00008 (abs.go:27)        SARQ    $63, AX
        0x000c 00012 (abs.go:28)        XORQ    AX, CX
        0x000f 00015 (abs.go:28)        SUBQ    AX, CX
        0x0012 00018 (abs.go:28)        MOVQ    CX, "".~r1+16(SP)
        0x0017 00023 (abs.go:28)        RET
...

When the compiler is compilingWithASMandWithTwosComplementdoing things so much like, the compiler has the right configuration and cross-platform benefits at this point, plus theGOARCH=386option to compile the program again to generate a compatible 32-bit system.

Finally, on the memory allocation, the implementation of all the above functions is the ideal situation, I rungo test -bench=. -benchme, observe the output of each function, the display does not occur memory allocation.

Summarize

WithTwosComplementImplementations provide better portability in Go, while implementing inline, branch-free code, 0 memory allocations, and value truncation to avoid type conversions. Benchmarking does not show the advantage of branching control over branching, but in theory, code with no branching control performs better in many cases.

Finally, my ABS implementation of Int64 is as follows:

func abs(n int64) int64 {    y := n >> 63    return (n ^ y) - y}

Via:optimized ABS () for Int64 in Go

Author: Ryan Armstrong
Translator: Wuyinbest
Proofreading: Rxcai

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Optimizing ABS () for Int64 type in Golang

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support