This is a creation in Article, where the information may have evolved or changed.
Objective
On Benchmarkgame (the world's hottest performance comparison site), the go language has a slot, which is extremely slow binary tree performance, which takes 40 seconds (on my machine, 16 seconds), in contrast, the Java version is 6 seconds, So here's the question: Why is it so slow? Let's take a closer look at the causes of slowness and see if we can improve on them.
For binary tree algorithm, the most cost-performance is the massive node allocation and bottomuptree () recursive function calls, and the two corresponding to the GO is the GC's Goroutine stack allocation.
Gc
There is no perfect GC in the world, and any choice has a price, which is generally a tradeoff between low latency and high throughput.
Problem description
Java uses a generational GC, with the advantage that node's allocation is very light, and the disadvantage is that generational GC requires more memory space, and when objects are moved to the tenured heap, lots of memory copies occur.
The go GC is not generational, so you need to consume more resources on node allocation. Go's GC chooses ultra-low latency while sacrificing partial throughput, and for the vast majority of applications, go is the right choice, but for binary tree algorithms, it's not a good fit.
Solution Solutions
For GC optimization, there are two common solutions for allocating the appropriate stack space and object reuse in advance. For binary tree algorithm, nodes is pre-allocated and reused.
Stack of Goroutine
Light weight goroutine is the soul of the Go language
Problem description
To make goroutine as lightweight as possible, go simply allocates 2KB of the initial stack size for each goroutine, after which the go dynamically expands the stack size as needed. Similarly, for most scenarios, this choice is very correct, but for binary tree algorithms, there are some problems with this choice.
Go is to check the stack size of the goroutine before each function call, and if it finds that the current stack is not enough, it will reallocate a new stack space and then copy the old stack into the new one. This overhead is very small, but in binary tree, Bottomuptree () basically does not do any work, the call is very frequent, so that the small cost accumulation will be very considerable. And the call to this function is deep recursion, and when the stack needs to grow, it may be copied several times, not just once!
Solution Solutions
Changing the Bottomuptree () to a non-recursive function, although not easy to implement, can be done.
Comparison of old and new binary tree implementations
No comparison, no harm!
Old version Code link
Runtime:
> TimeGoRunOld.go -Stretch Tree ofDepth +Check:-12097152Trees ofDepth4Check:-2097152524288Trees ofDepth6Check:-524288131072Trees ofDepth8Check:-13107232768Trees ofDepthTenCheck:-327688192Trees ofDepth ACheck:-81922048Trees ofDepth -Check:-2048 +Trees ofDepth -Check:- + -Trees ofDepth -Check:- - +Trees ofDepth -Check:- +Long lived tree ofDepth -Check:-1Real 0M16. 279Suser1M47. 569Ssys0M2. 663S
New version Code link
Runtime
TimeGo runNew. Go -Stretch Tree ofDepth +Check:-12097152Trees ofDepth4Check:-2097152524288Trees ofDepth6Check:-524288131072Trees ofDepth8Check:-13107232768Trees ofDepthTenCheck:-327688192Trees ofDepth ACheck:-81922048Trees ofDepth -Check:-2048 +Trees ofDepth -Check:- + -Trees ofDepth -Check:- - +Trees ofDepth -Check:- +LongLived tree ofDepth -Check:-1Dur1.71074946Sreal0M1. 914Suser0M10. 149Ssys0M0. 157S
Conclusion
Performance increased from 16.28 seconds to 1.91 seconds, great!
The solution presented here appears to be for binary tree, which is actually common to any GC language and usage scenario.
Keep these two solutions in mind:
- Memory space Pre-allocation
- Object Reuse