Recently spent two days with the muduo part of the implementation of the Memcached server protocol, the code is located in Examples/memcached/server, can pass memcached most of the test cases (INCR/DECR has not yet implemented).
This is not a substitute for memcached (it does not implement LRU and timeout functions, nor does it implement binary protocols, nor does it manage memory itself), but rather an example of network programming (code is only 1000 lines, much smaller than memcached), displaying Muduo-style event-driven programming, and future performance-optimized tests (in other words, this version has no performance effort at all). People who read memcached code can compare the differences between the two programming styles, memcached read/write operations interspersed with normal logic processing, and Muduo network data reading and writing is completed by the library, the application only care about the message sent and received, the current both basic get/set The performance of the operation is equal.
The Gperftools remote profiling feature is now built into the Muduo inspector, and Memcached-debug shows its usage.
Why not optimize the performance of set operations (including Set/add/update/append/prepend/cas, etc.)?
1. Proportion. Since it is memcache, then the ratio of Get:set is high, 10:1 is even higher, so the center of gravity of the optimization should be get rather than set.
Assuming that memcached can handle 100k QPS, assuming that these operations are set (in fact, less than 10% is set), and assuming that all sets are executed serially (without concurrency), then the CPU time for each set should not exceed us (with a server-local network Code run time, but no network latency is included. In fact, the CPU time of a set at most is 2~3 us (measured by the Memcached-footprint program), and it is not worth optimizing at all.
2. Network bandwidth. Suppose that the length of the key + value of a set operation is 1k BYTES,TCP the payload bandwidth is estimated by 110mb/s, then 1kB data on gigabit network inertia delay is 9us (transmission delay is dozens of microseconds, unrelated), which means that the server's network card received this 1kB number It takes 9us of time (from the first byte to the server to the last byte), so it is hard to optimize the set when it is time-consuming to 2~3 us.
3. The cost of generating "data to be updated" is much larger than that of memcached set. Memcached needs to be updated, often the new data has been written to the database into the memcached, then the cost of writing a database is far greater than the cost of memcached set, optimization set to improve the overall performance of the system is meaningless.
See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/cplus/