This is a creation in Article, where the information may have evolved or changed.
At the moment, when go needs to integrate with C + + code, the first thing you think of is CGO. After all, it's the official solution, and it's simple.
But the CGO is very slow. Because CGO is actually a bridge, by automatically generating code, CGO has built a bridge to communicate the world of C + + and go in the case of preserving the C + + runtime. This means that the compatibility is good, but the call to the C function must first suspend the current goroutine and switch the execution stack to the current thread M's main stack (size 2MB). If you do not do this, then you can only execute the C function call on the Goroutine stack, but the goroutine stack is generally small, which easily leads to a stack overflow.
When calling the C function, the current stack must be switched to the thread's main stack, which brings two more serious problems:
- Thread stack in Go run is relatively small, by the number of p/m limit, generally can be easily understood to be restricted by Gomaxprocs;
- Because of the need to preserve the C/C + + runtime at the same time, CGO needs to translate and coordinate between the two runtimes and the two ABI (abstract binary interface). This brings a lot of overhead.
A byproduct of the Minio project was the C2GOASM project, which was also well-received by the GO-CV-SIMD project.
C2goasm's role is an assembly language converter, the input is the AMD64 assembly of the clang output, the output is the go assembly. The input to clang is the C/s + + language. Limitations are not rtti and exceptions. This means that you cannot have the advanced features provided by the C + + runtime.
C2goasm output of Go assembly, to go tool chain can directly generate go executable code.
C2goasm and CGO, the biggest improvement is:
- There is no more logic overhead in the conversion between the two-C + + operations;
- There is no need to switch to the thread's main stack to execute the function, because c2goasm generates a pure go function, which does not require the thread's main stack to execute;
This greatly improves performance at the cost of compatibility and portability losses.