Among many nosql products, what we can see through benchmark is that the write performance is greatly improved, while the read performance has not increased much or even declined from the traditional RDBMS. For example, Cassandra and MongoDB are two outstanding representatives of nosql. The reason may be that the UGC model is becoming increasingly popular, and the read/write ratio is close to or less than due to user-generated content.
But I don't think this is a real reason.
1. cache makes the storage's raw Read efficiency no longer important
The real reason is that we have done enough Optimization on reading. We use memcached, tokyotyrant/tokyocabinet and other cache storage for data storage, and we use squid and nginx proxy_cache for page and file cache, can achieve a very good read cache effect, if the real-time data requirements are not high, or the cache design is reasonable (read and write are cache), the cache hit rate will be high enough, therefore, we do not need to optimize the raw Read efficiency of the underlying storage too much.
Imagine if the cache layer has a hit rate of more than 99%, then our hundreds of millions of data read requests can easily become millions of requests compared with raw read devices, thousands of concurrent jobs can easily become dozens of concurrent jobs. Of course, this requires that our cache layer be reliable. For example, nginx proxy_cache can be used more frequently. In this case, the downtime of one server does not allow all read requests to penetrate to the underlying storage. The full execution of purge and other operations is not discussed in this article.
In summary, the raw Read efficiency does not need to be improved because its needs have been largely replaced by the cache layer.
2. unreplaceable rawwrite Functions
We can see that the cache reduces the raw read workload. We can wonder if there is any way to reduce the workload of rawwrite. The answer is no. If you think so. You can leave a message to discuss. Since the workload of rawwrite is irreplaceable, we can improve the performance of write operations in two ways.
3.1 sharding
By partitioning the data, we can store the data in a distributed manner, so each node will only be allocated to a part of rawwrite requests. This is equivalent to keeping the efficiency of the company's employees unchanged and recruiting more people. However, due to the increase of nodes, the efficiency of node problems is also greatly increased. So we had to do some replication operations to provide the HA solution.
3.2 improve rawwrite Efficiency
For the above example, we can only choose to improve the rawwrite efficiency to achieve better overall (including the cache layer) read and write efficiency. The general method used here is to serialize random write operations in the memory, and perform sequential flush to disk operations after a certain amount. This is what we mean by taking the memory as a hard disk and the hard disk as a tape. (See my earlier article: nosql theory-memory is a new hard disk and hard disk is a new tape.) therefore, we can see that many nosql products have optimized write operations, but the read performance is not significantly improved, and even do not hesitate to use slower read as the cost to improve the write operation performance.
4. Summary
Because of the read performance, you can set a reasonable cache policy to reduce the number of raw read operations. Therefore, not only do you need to optimize write operations when the read/write ratio is small, but you still need to optimize the write performance rather than the read performance when the read/write ratio is large.
Address: http://news.cnblogs.com/n/77216/