A few days ago a big system glitch occurred, many technical friends are more concerned about, the reason will not exceed James Hamilton in on designing and Deploying Internet-scale Service (1) outlined in the range, James first Experience "Design for failure" is a key to the success of all Internet architectures. The engineering theory of the Internet system is actually very simple, the content of James paper is almost not theory, but a number of practical experience sharing, each company's understanding of these experiences and executive power determines the architecture success or failure.
By the end of the topic, we have recently studied Redis. Last year there was a memcachedb, Tokyo tyrant, Redis performance test, so far the benchmark results remain valid. Over the past 1 years we have experienced a lot of dazzling key value storage products, the temptation to fade from Cassandra (Twitter pauses in the main business use) to the rise of HBase (Facebook's new mailbox business chooses HBase (2)), and when we go back to see Redis, It is found that only more than 10,000 lines of source code program is full of magic and a lot of untapped features. Redis performance is amazing, the domestic top ten website sub-products estimated with 1 Redis can meet the storage and cache needs. In addition to the performance of the impression, the industry in fact there is a common understanding of Redis has some misunderstanding. This paper puts forward some viewpoints for everyone to discuss.
1. What is Redis
The result of this problem affects how we use Redis. If you think that Redis is a key value store, it might be used instead of MySQL, and if you think of it as a persistent cache, it might just save some of the temporary data that is frequently accessed. Redis is an abbreviation for remote DIctionary server, and the subtitle on Redis on the official website is a persistent key-value database with built-in NET interface written in Ansi-c for Posix systems, this definition is biased toward key value store. There are also some views that Redis is a memory database because its high performance is based on the operation of the RAM. Others argue that Redis is a data structure server because Redis supports complex data features such as list, set, and so on. Different interpretations of the role of Redis determine how you use Redis.
Internet data is currently used in two different ways to store, relational database or key value. But these internet services do not belong to these two types of data, such as the user's relationship in the social platform, it is a list, if you want to use relational database storage needs to be converted into a multi-row record form, this form has a lot of redundant data, each row needs to store some duplicate information. If you store with key value it is cumbersome to modify and delete, you need to read out all the data and write again. Redis has designed a variety of data types in memory, allowing the business to access these data structures at high speed, and without concern for persistent storage issues, from an architecture that addresses some of the previous two types of storage needs.
2. Redis can't be faster than memcache
Many developers think that Redis is unlikely to be faster than memcached, memcached is completely memory-based, and Redis has a persistent save feature, and even if it is asynchronous, Redis is unlikely to be faster than memcached. But the test results are basically the absolute advantage of Redis. Have been thinking about this reason, now think of the reasons for these aspects.
- Libevent. Unlike memcached, Redis does not have a choice of libevent. Libevent to cater to versatility, the code is huge (currently Redis code is less than 1/3 of Libevent) and has sacrificed a lot of performance on a particular platform. Redis has implemented its own Epoll event loop (4) with two file modifications in the libevent. Many developers in the industry have also suggested that Redis use another libevent high-performance alternative to Libev, but the author insists that Redis should be small and dependent. An impressive detail is that the Redis does not need to be executed before it is compiled./configure.
- CAS issue. CAS is a convenient way to prevent competition from modifying resources in memcached. CAS implementations need to set a hidden CAS token,cas equivalent value version number for each cache key, each time the set token needs to be incremented, resulting in a dual overhead of CPU and memory, although these costs are small, but to stand-alone 10g+ After the cache and the QPS tens of thousands, these costs will bring some minor performance differences to each other (5).
3. Single Redis storage data must be smaller than physical memory
Redis's data is all put in memory for high-speed performance, but it also brings some unreasonable. For example, a medium-sized website has 1 million registered users, and if the data is to be stored using Redis, the capacity of the memory must be able to accommodate the 1 million users. But the business situation is 1 million users only 50,000 active users, 1 weeks to visit 1 times also only 150,000 users, so all 1 million users of data are placed in memory unreasonable, RAM needs to pay for cold data.
This is very similar to the operating system, where all the data accessed by the operating system is in memory, but if the physical memory does not hold the new data, the operating system intelligently exchanges some of the long-running data that is not accessed to disk, leaving room for new applications. Modern operating systems provide applications with not physical memory, but virtual memory concepts.
Based on the same considerations, Redis 2.0 also adds VM features. Let Redis data capacity break through the limits of physical memory. And it realizes the separation of data and heat.
4. Redis's VM implementation is a repetitive wheel build
Redis VMS Follow the previous Epoll implementation of the idea is still self-fulfilling. However, the introduction of the previous operating system mentioned that the OS can also automatically help the program to achieve cold and hot data separation, Redis only need the OS to request a large memory, the OS will automatically put hot data into physical memory, cold data exchange to the hard disk, another well-known "understanding of modern operating system (3)" Varnish is the realization of this, but also achieved a very successful effect.
Author Antirez Several reasons for explaining why to implement the VM himself (6). The main OS VM swap out is based on the page concept, such as OS VM1 page is 4K, 4K as long as there is an element even if only 1 bytes are accessed, the page will not be swap, the same reason, read a byte may be swapped into 4K of useless memory. Redis's own implementation can achieve control of the granularity of the change in. In addition to accessing the OS Swap memory area, the block process is also one of the reasons why Redis is implementing its own VMS.
5. Using Redis in Get/set mode
As a key value exists, many developers naturally use the Set/get way to use Redis, in fact, this is not the most optimized way to use. In particular, in cases where VMs are not enabled, all Redis data needs to be put into memory, which is especially important for saving memory.
If a key-value unit needs to occupy a minimum of 512 bytes, it takes 512 bytes to save only one byte. At this time there is a design mode, you can reuse the key, a few key-value into a key, value is stored as a set, so the same 512 bytes will be stored 10-100 times the capacity.
This is to save memory, it is recommended to use Redis in a hashset rather than set/get way, see the Reference (7) for details.
6. Use aof instead of snapshot
Redis has two storage methods, the default is the snapshot way, the implementation method is to periodically persist the memory snapshot (snapshot) to the hard disk, the disadvantage is that if the crash occurs after persistence, it will lose a piece of data. As a result of the perfectionist's push, the author added the AoF way. AoF that is append only mode, the Operation command is saved to the log file while writing the memory data, in a system with tens of thousands of concurrent changes, the command log is a very large amount of data, the management maintenance costs are very high, recovery rebuild time will be very long, which leads to the loss of AOF high-availability intention. What's more, Redis is a memory data structure model, and all of the advantages are based on the efficient atomic operation of complex memory structures, so that aof is a very uncoordinated part.
In fact, the main purpose of AOF is data reliability and high availability, in Redis there is another way to achieve the purpose: Replication. Because of the high performance of Redis, there is virtually no delay in replication. This is achieved by preventing single points of failure and achieving high availability.
Summary
To successfully use a product, we need to understand its features in depth. Redis performance is outstanding, if can skillfully control, for many large-scale domestic applications have a great help. Want more peers to join the ranks of Redis usage and code research.
Reference documents
- On designing and Deploying Internet-scale Service (PDF)
- Facebook ' s New real-time Messaging system:hbase to Store 135+ billion Messages A Month
- What ' s wrong with 1975 programming
- Linux Epoll is now supported (Google Groups)
- CAS and why I don't want to add it to Redis (Google Groups)
- Plans for Virtual Memory (Google Groups)
- Full of keys (Salvatore Antirez Sanfilippo)
-eof-
Previous Bovendo IDC data timing problem and methodology on Sina Weibo there are more discussions and comments, interested can go to the onlookers. http://t.sina.com.cn/10503/zF0tex7z7b (login required)
Some misunderstandings about Redis