2011 Goroutine Performance Test

Last Update:2015-05-12 Source: Internet

Author: User

Tags arch linux

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a creation in Article, where the information may have evolved or changed.

Note that the forwarded article time is 2011, time is older, so the data is only for reference. Can reflect the advantages of Golang.

The original is here: http://en.munknex.net/2011/12/golang-goroutines-performance.html

———————— – Translate split line ———————— –

Overview

In this article, I will try to evaluate the performance of Goroutine. Goroutine is something like a lightweight thread. In order to provide native multitasking, it (together with the channel) is built into go.

The documentation tells us:

It actually creates hundreds or thousands of goroutine in the same address space.

Therefore, the focus of this article is to test and identify the maximum performance pressure that can be sustained in such a large concurrency function.

Memory

The space required to create a new goroutine is not recorded in the document. Just say thousands of bytes is required. Tested under different mechanisms to help confirm that this value is 4-4.5kb. As a result, 5GB is almost enough to run 1 million goroutine.

Performance

Let's figure out how much performance it would cost to run a function in a goroutine. You may already know that this is very simple-just add the Go keyword before the function call:

go testFunc()

Goroutine is reused for threads. By default, if the GOMAXPROCS environment variable is not set, the program uses only one thread. To take advantage of the full CPU core, you must set its value. For example:

export GOMAXPROCS=2

This value is used at run time. Therefore, it is not necessary to recompile the program after each modification of the value.

In my inference, most of the time is spent creating goroutine, switching them, migrating from one thread goroutine to another, and communicating goroutine between different threads. To avoid endless discourse, let's start with the case of just one thread.

All the tests were done on my nettop: Intel's low-cost, simple desktop solution:

Atom D525 Dual Core 1.8 GHz
4Gb DDR3
Go r60.3
Arch Linux x86_64

Method

This is the test function generator:

func genTest (n int) func (res chan <- interface {}) { returnfunc(res chan <- interface {}) { fori := 0; i < n; i++ { math.Sqrt(13) } res <- true } }

Then here is a series of function sets that calculate sqrt (13) 1, 10, 100, 1000, and 5,000 times separately:

Testfuncs: = [] Func (Chan <-Interface {}) {Gentest (1), Gentest (Ten), gentest (+), gentest (+), Gentest (500 0)}

I perform x times in the loop for each function and then execute x times in the Goroutine. Then compare the results. Of course, you should pay attention to garbage collection. To reduce its impact, I explicitly called the runtime after the goroutine ended. GC () and record the end time. Of course, for testing accuracy, each test executes many times. The entire run time took about 16 hours.

A thread

export GOMAXPROCS=1

The chart shows that the sqrt () calculation that runs in Goroutine is about four times times slower than running in a function.

Take a look at the remaining four functions:

You will notice that even if the concurrent execution of 700,000 goroutine does not reduce performance to less than 80%. Now is the most admirable place. Starting from sqrt () x1000, the overall consumption is less than 2%. 5,000 Times--only 1%. It seems that this value has nothing to do with the number of Goroutine! So the only limiting factor is memory.

Profile:

If the non-dependent code executes more than 10 times times the sum of the calculated squares, and you want it to execute concurrently, you should not hesitate to let it run in Goroutine. Although it's easy to put 10 or 100 of these codes together, the loss performance is only 20% and 2%, respectively.

Multithreading

Now let's look at what happens when we want to use a number of processor cores. In my use case is 2:

export GOMAXPROCS=2

Execute our test program again:

Here you will find that even though the number of cores has increased by a factor, the first two functions have increased in time! This is likely to be much more expensive to move between threads than to execute them. :) The current scheduler is not yet processed, but the developers of Go are committed to solving this situation in the future.

As you've seen, the last two functions completely use two cores. On my nettop, their execution time is ~45µs and ~230µs, respectively.

Summarize

Although this is a young language and has a temporary scheduler implementation, Goroutine's performance is exciting. Especially when combined with the easy-to-use Go. That impressed me. Thanks to the Go development team!

When the running time is less than 1µs I will deliberate before executing goroutine, and if the running time is more than 1ms, then never hesitate to use Goroutine. ：）

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More