This is a creation in Article, where the information may have evolved or changed.
This finished too much, in order to better display the benchmark series of results data, I have to find a software to chart the data. Previously on the Windows basically used Execel to draw graphs, histograms, etc., but on the Linux/mac can not find a handy tool. I also used gnuplot, this goods do not know is too professional, or too old reason, always use not comfortable, not happy. So I decided to use go and chart.js library to toss a goplot tool to draw a chart before I started writing this blog.
Someone might say I'm tossing the wheel again, and it's really a wheel. Say, what about this? As a programmer, the biggest advantage is the use of their own unhappy tools, can do their own perfect, or even write a new. I think a geek programmer is the first to learn to constantly equip his own library of tools. No nonsense, back to the point.
Test dimension
The main two dimensions of this benchmark are tested
- A QPS on a single connection, with a different number of concurrent numbers
- Fixed 100 concurrent, different connection number of QPS
People who have tested Web server may have a little doubt about concurrency here. The number of concurrent Web servers typically refers to the number of connections, where the number of concurrent numbers is not the number of connections, but the number of goroutine (understood as the number of threads) that are sending requests to the server at the same time. Because RPC's network connection is to support concurrent requests, that is, a connection can have a lot of requests running at the same time, this is different from the HTTP protocol one-answer mode, but is very similar to the Spdy protocol.
维度1The focus is on the limit of the number of requests that can be run on a connection, which can be used to roughly measure connections to the backend. After all, back-end systems (such as: DB) do not want to have a large number of connections, which is a bit different from the Web server. This dimension is more of a measure of how fast RPC clients can send.
维度2The basic concern is the performance of the RPC server, that is, to see a goroutine after a connection, the server can also cope with how much concurrency.
Test environment
- The client and server are deployed on two machines, and of course both machines are in the same LAN, which can ignore the network.
- Both client and server machines are 8-core CPU Hyper-threading to 16 cores, and memory and Nic are far enough.
- The client sends only one 100bytes string to the server per request.
- Gomaxprocs set to 16
Test results
A QPS with a single connection and a different number of concurrent numbers
You can see that as the number of concurrent connections on this connection increases, the QPS stabilizes at around 5.5W. This performance is a bit low. The performance impact of this piece comes primarily from lock overhead, serialization/deserialization, and goroutine scheduling. An RPC client (that is, a connection to the server) has an input goroutine that is responsible for reading the server's response and then distributing it to each write request Goroutine, and if the goroutine cannot be dispatched for a long time, it will cause the sender to wait for blocking. Go current scheduling obviously will have this problem, stay to the scheduler and chat.
100 concurrent, QPS with different connection numbers
More than one RPC client can reach as the number of connections increases, the QPS finally reaches around 20W, at which point the bottleneck mainly occurs on the CPU. A long time ago, I used the same batch of machines to test Avro 100 threads 100 connections, QPS in 14+w. In this dimension, the RPC performance of Go is very good.
Performance should be an issue that requires our continued attention and optimization.
The final attached test code: https://gist.github.com/skoo87/6510680