A major system fault occurred on Weibo a few days ago, and many technical friends are concerned about it. The reason for this is no more than James Hamilton's on
The first experience of James, design for failure, is the key to the success of all Internet architectures. The engineering theory of Internet systems is actually very simple. James paper's content is almost not a theory, but a number of practical experiences are shared. Each company's understanding and execution of these experiences determine the success or failure of the architecture.
Now I have studied redis. Last year, I worked on a memcachedb,
Tokyo tyrant, redis performance test, so far, this benchmark result is still valid. Over the past year, we have experienced the temptation of a lot of dazzling key-value storage products to fade out from Cassandra (Twitter is suspended for main services) to the rise of hbase (Facebook's new mailbox business uses hbase (2), and when we look back at redis, it is found that the program with only over 10 thousand lines of source code is full of magical and unmined features. The performance of redis is amazing. The sub-products of the top 10 websites in China are estimated to be able to meet the storage and cache needs with one redis. In addition to the performance impression, there are some misunderstandings about redis in the industry. This article provides some ideas for your discussion.
1. What is redis?
This problem affects how we use redis. If you think redis is a key value store, it may be used to replace MySQL; if you think it is a persistent cache, it may only save some temporary data that is frequently accessed. Redis is short for remote dictionary server, and on redis's official website the subtitle is a persistent key-value database with built-in net interface written in ANSI-C for POSIX systems, which defines a key bias
Value Store. There are also some rules that hold that redis is a memory database, because its high performance is based on memory operations. Others think that redis is a data structure server, because redis supports complex data features, such as list and set. Different interpretations of the role of redis determine how you use redis.
Currently, Internet data is stored in relational databases or key values in two ways. However, these Internet businesses do not belong to these two data types. For example, the relationship between users in the social platform is a list, if you want to store data in a relational database, you need to convert it into a multi-row record format, which has a lot of redundant data, and each row needs to store some duplicate information. If the key value is used for storage, modification and deletion are troublesome. You need to read all the data before writing it. Redis has designed various data types in the memory, allowing businesses to access these data structures at high speed and without the need to worry about persistent storage, the architecture solves the problems that the first two types of storage need to take some detours.
2. redis cannot be faster than memcache
Many developers believe that redis cannot be faster than memcached. memcached is fully memory-based, while redis has the persistence storage feature. Even asynchronous redis cannot be faster than memcached. However, the test result is basically the absolute advantage of redis. I have been thinking about this reason. The reasons for this are as follows.
- Libevent. Unlike memcached, redis does not select libevent. In order to cater to the versatility, libevent causes huge code (currently redis code is less than 1/3 of libevent) and sacrifices a lot of performance on specific platforms. Redis uses two files in libevent to modify its epoll.
Event loop (4 ). Many developers in the industry also suggested using another libevent to replace libev with high performance, but the author insisted that redis should be small and dependent. One impressive detail is that you do not need to execute./configure before compiling redis.
- CAS problems. CAS is a convenient method in memcached to prevent competition for resource modification. The CAS implementation needs to set a hidden CAS token for each cache key. The CAS version is equivalent to the value version number, and the token needs to increase each set. Therefore, the CPU and memory overhead is generated, although these overhead are small, however, after a single machine has 10 Gb + cache and tens of thousands of QPS, these overhead will bring some minor performance differences (5) to the two sides ).
3. data stored in a single redis instance must be smaller than the physical memory.
Putting all redis data in the memory brings high-speed performance, but it also brings some unreasonable points. For example, a medium-sized website has 1 million registered users. If the data is stored in redis, the memory capacity must be able to accommodate these 1 million users. However, the actual business situation is that 1 million users only have 50 thousand active users, and only 0.15 million users have accessed the service once in a week. Therefore, the data of all 1 million users is not properly stored in the memory, ram needs to pay for cold data.
This is very similar to the operating system. All applications in the operating system access data in the memory, but if the physical memory cannot accommodate new data, the operating system intelligently switches some data that has not been accessed for a long time to the disk, leaving space for new applications. Modern Operating systems provide applications with virtual memory instead of physical memory.
Based on the same considerations, redis 2.0 also adds VM features. This allowed the redis data capacity to break through the physical memory limit. The cold/hot data separation is realized.
4. redis's Vm implementation is repetitive.
Redis's Vm is still implemented by itself based on the previous epoll Implementation ideas. However, as mentioned in the previous introduction of the operating system, the OS can also automatically help the program to achieve hot and cold data separation. redis only requires the OS to apply for a large memory, and the OS will automatically put hot data into the physical memory, the varnish, also known as "understanding modern operating system (3)", implements cold data exchange to hard disks and achieves great success.
The author explained several reasons for self-Implementation of the VM (6 ). The VM swap-in and swap-out of major OS is based on the page concept. For example, if the OS VM1 page is 4 kb, if one element exists in 4 kb, even if only one byte is accessed, this page will not be swap, and it is also true that reading a byte may be converted into 4 K useless memory. Redis can achieve the granularity of control switch-in. In addition, when accessing the swap memory area of the operating system, the block process is also one of the reasons that redis needs to implement its own VM.
5. Use redis in get/set mode
As a key value, many developers naturally use redis in the Set/get method. In fact, this is not the optimal method. Especially when the VM is not enabled, all redis data needs to be stored in the memory, which is especially important to save memory.
Assume that a key-value unit occupies a minimum of 512 bytes, even if only one byte is saved, the unit occupies 512 bytes. At this time, there is a design mode that can reuse keys, put several key-values into a key, and store values as a set, so that 512 bytes will store 10-times of capacity.
To save memory, we recommend using hashset instead of set/get to use redis. For details, see references (7 ).
6. Use aof instead of snapshot
Redis can be stored in two ways. The default mode is snapshot. The implementation method is to regularly save the snapshot of the memory to the hard disk, the disadvantage of this method is that if a crash occurs after persistence, a piece of data will be lost. Therefore, driven by the perfectionist, the author adds the aof method. Aof is append only mode. When writing memory data, the Operation Command is saved to the log file. In a system with tens of thousands of concurrent changes, the command log is a very large data, management and maintenance costs are very high, and the restoration and reconstruction time will be very long, leading to the loss of aof High Availability intent. What's more, redis is a memory data structure model. All its advantages are based on efficient atomic operations on complex memory data structures, this shows that aof is a very uncoordinated part.
In fact, aof mainly aims at data reliability and high availability. There is another method in redis to achieve the goal: replication. Because of redis's high performance, there is basically no replication latency. This prevents spof and achieves high availability.
Summary
To successfully use a product, we need to have a deep understanding of its features. Redis has outstanding performance. If you are proficient in controlling redis, it will be of great help to many large domestic applications. We hope that more colleagues will join redis usage and code research.