How to improve the performance of Go call C by 10 times times?

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

At the moment, when go needs to integrate with C + + code, the first thing you think of is CGO. After all, it's the official solution, and it's simple.

But the CGO is very slow. Because CGO is actually a bridge, by automatically generating code, CGO has built a bridge to communicate the world of C + + and go in the case of preserving the C + + runtime. This means that the compatibility is good, but the call to the C function must first suspend the current goroutine and switch the execution stack to the current thread M's main stack (size 2MB). If you do not do this, then you can only execute the C function call on the Goroutine stack, but the goroutine stack is generally small, which easily leads to a stack overflow.

When calling the C function, the current stack must be switched to the thread's main stack, which brings two more serious problems:

    1. Thread stack in Go run is relatively small, by the number of p/m limit, generally can be easily understood to be restricted by Gomaxprocs;
    2. Because of the need to preserve the C/C + + runtime at the same time, CGO needs to translate and coordinate between the two runtimes and the two ABI (abstract binary interface). This brings a lot of overhead.

A byproduct of the Minio project was the C2GOASM project, which was also well-received by the GO-CV-SIMD project.

C2goasm's role is an assembly language converter, the input is the AMD64 assembly of the clang output, the output is the go assembly. The input to clang is the C/s + + language. Limitations are not rtti and exceptions. This means that you cannot have the advanced features provided by the C + + runtime.

C2goasm output of Go assembly, to go tool chain can directly generate go executable code.

C2goasm and CGO, the biggest improvement is:

    1. There is no more logic overhead in the conversion between the two-C + + operations;
    2. There is no need to switch to the thread's main stack to execute the function, because c2goasm generates a pure go function, which does not require the thread's main stack to execute;

This greatly improves performance at the cost of compatibility and portability losses.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.