This is a creation in Article, where the information may have evolved or changed.
Also published in the standalone blog.
It has been thought that, in the Golang, in the case of high concurrency, the use of multi-core processing of a certain effect is optimal, but the project practice proves that this is not true.
In the Sniper project (an HTTP load test tool that combines the benefits of AB and siege), the total number of CPU uses has been set to the system CPUs:
Runtime. Gomaxprocs (runtime. NUMCPU ())
There has been a large gap in performance comparisons with AB, GET request a 10k size file for LAN:
The following is the performance of AB, concurrent 100, Total request 100k, execution time 16.082 seconds
Concurrency level:100
Time taken for tests:16.082 seconds
Complete requests:100000
Failed requests:0
Write errors:0
Total transferred:1035500000 bytes
HTML transferred:1024000000 bytes
Requests per second:6218.04 [#/sec] (mean)
Time per request:16.082 [MS] (mean)
Time per request:0.161 [MS] (mean, across all concurrent requests)
Transfer rate:62878.74 [Kbytes/sec] Received
Next, use the sniper test to set the runtime. Gomaxprocs (runtime. NUMCPU ())
transactions:100000 Hits
availability:100.00%
Elapsed time:20.82 secs
totaltransfer:0.00 MB
htmltransfer:0.00 MB
Transaction rate:4802.45 trans/sec
throughput:0.00 mb/sec
successful:100000 Hits
failed:0 Hits
transactiontime:0.00021 secs (mean)
connectiontime:0.00010 secs (mean)
requesttime:0.00000 secs (mean)
responsetime:0.00011 secs (mean)
You can see the same server being tested, and it takes 20.82 seconds to use all the cpu,sniper.
Finally I set up runtime again. Gomaxprocs (1)
transactions:100000 Hits
availability:100.00%
Elapsed time:16.71 secs
totaltransfer:0.00 MB
htmltransfer:0.00 MB
Transaction rate:5985.03 trans/sec
throughput:0.00 mb/sec
successful:100000 Hits
failed:0 Hits
transactiontime:0.00017 secs (mean)
connectiontime:0.00003 secs (mean)
requesttime:0.00000 secs (mean)
responsetime:0.00014 secs (mean)
As you can see, the sniper's execution time drops to 16.71 seconds, down by 20%.
Did not expect to optimize for so long performance finally unexpectedly through such a way forward a big step!
Where is the reason for this situation?
At the moment I also cannot explain clearly, may relate to the CPU context switch, the detailed principle needs to study again. Please advise me of the reason why we know the principle.
Update
One possible cause: see here
same as in all other concurrency frameworks , Goroutine in the so-called "no lock" advantage only in single-threaded, if $gomaxprocs > 1 and communication between the co-process, the Go runtime will be responsible for lock protection data Sniper there is a lot of inter-process communication, it is possible that the lock affects performance.