Go Some misunderstandings in Redis

Source: Internet
Author: User
Tags cas epoll memcached value store

Transferred from timyang:http://timyang.net/data/redis-misunderstanding/

A few days ago Weibo had a large system failure, many technical friends are more concerned about, the reasons for this will not exceed James Hamilton in the designing and Deploying Internet-scale Service (1) outlined in the range, James first Experience "Design for failure" is a key to the success of all Internet architectures. The engineering theory of the Internet system is very simple, James paper's content is almost not theoretical, but a number of practical experience to share, each company's understanding of these experiences and executive power determines the success of the architecture.

After that, I have recently studied Redis. Last year there was a memcachedb, Tokyo tyrant, Redis performance test, so far this benchmark result is still valid. Over the past 1 years we have experienced a lot of dazzling key value storage products, from the Cassandra (Twitter pause in the main business use) to the rise of HBase (Facebook's new mailbox business is selected HBase (2)), and then look at Redis again, The discovery of this program, which has only 10,000 or more lines of source code, is full of magical and massive untapped features. Redis performance is amazing, the top ten sites in the country's products estimated with 1 Redis can meet the storage and cache demand. In addition to the performance of the impression, the industry in fact, the general understanding of Redis there are some misunderstanding. This article puts forward some viewpoints for you to discuss. 1. What is Redis

The result of this problem affects how we use Redis. If you think Redis is a key value store, it may be used instead of MySQL; If you think of it as a persistent cache, it might just save some of the frequently accessed temporary data. Redis is the abbreviation for Remote DIctionary server, in Redis the subtitle of the official website is a persistent key-value database with built-in NET interface written in Ansi-c for Posix systems, this definition is biased toward the key value store. There are also views that Redis is a memory database because its high performance is based on memory operations. Others argue that Redis is a data structure server, because Redis supports complex data features such as list, set, and so on. Different interpretations of the role of Redis determine how you use the Redis.

Internet data is currently stored in two basic ways, relational databases or key value. But the Internet business itself does not belong to these two types of data, such as the user's relationship in the social platform, it is a list, if you want to use relational database storage needs to be converted into a form of multi-line records, this form has a lot of redundant data, each line needs to store some duplicate information. If you use the key value store to modify and delete more trouble, you need to read out all the data and write. Redis designed a variety of data types in memory, allowing the business to access these data structures at high speed, and without concern for persistent storage, the architecture solves the problem of the previous two storage needs to take some detours. 2. Redis can not be faster than memcache

Many developers think that Redis can not be faster than memcached, memcached is completely based on memory, and Redis has persistent preservation features, even if asynchronous, Redis can not be faster than memcached. But the test result basically is Redis occupies the absolute advantage. Have been thinking about this reason, now think of the reasons for these aspects. Libevent. Unlike memcached, Redis did not choose Libevent. Libevent's code is huge to cater to versatility (the current Redis code is less than Libevent 1/3) and sacrifices a lot of performance on a particular platform. Redis has implemented its own Epoll event loop (4) with the modification of two files in Libevent. Many developers in the industry also suggest that Redis use another libevent high-performance alternative Libev, but the authors insist that Redis should be small and rely on the idea.  An impressive detail is that the compilation of Redis does not need to be performed before./configure. CAS issue. CAS is a convenient way to prevent competition and modify resources in memcached. CAS implementation needs to set a hidden CAS token,cas equivalent value version number for each cache key, each time the set will token need to increment, resulting in a dual cost of CPU and memory, although these costs are small, but to stand-alone 10g+  Cache and QPS Tens of thousands of these costs will give the two sides relatively small performance difference (5). 3. The storage data of a single redis must be smaller than the physical memory

Redis data is all about memory, but it also brings some irrationality. For example, a medium-sized site has 1 million registered users, if the data to be stored with Redis, the capacity of memory must be able to accommodate these 1 million users. But the actual business is 1 million users only 50,000 active users, 1 weeks to visit 1 times only 150,000 users, so all 1 million users of the data are placed in the memory is unreasonable, ram need to pay for cold data.

This is very similar to the operating system, where all of the data accessed by the operating system is in memory, but if the physical memory does not hold the new data, the operating system intelligently switches portions of the data that are permanently inaccessible to disk, leaving room for new applications. The modern operating system provides the application with not physical memory, but the concept of virtual memory (Memory).

Based on the same considerations, Redis 2.0 also adds VM features. Let the redis data capacity break through the limitations of physical memory. and realized the data cold and hot separation. 4. The VM implementation of Redis is to repeat the wheel build

Redis VM in accordance with the previous Epoll implementation of the idea is still their own implementation. However, the introduction of the previous operating system mentioned that the OS can also automatically help the program to achieve hot and cold data separation, Redis only need the OS to request a large memory, the OS will automatically put hot data into physical memory, cold data exchange to the hard disk, another well-known "understanding of the modern operating system (3) Varnish is the realization of this, but also achieved a very successful effect.

Author Antirez Several reasons for explaining why he wants to implement the VM himself (6). The main OS VM swapping out is based on the page concept, for example, OS VM1 page is 4K, 4K as long as there is only one element even if only 1 bytes are accessed, the page will not be swap, the same reason, read a byte may be swapped into 4K useless memory. and Redis own implementation can achieve control of the size of the swap. The block process in addition to accessing the operating system Swap memory area is one of the reasons for Redis to implement the VM itself. 5. Use Redis in Get/set way

As a key value exists, many developers naturally use the Set/get method to use Redis, which is actually not the most optimized use. Especially if the VM is not enabled, Redis all data needs to be put into memory, saving memory is especially important.

If a key-value unit requires a minimum of 512 bytes, it takes up 512 bytes even if only one byte is saved. At this time there is a design mode, you can use key reuse, several key-value into a key, value and as a set deposit, so that the same 512 bytes will be stored 10-100 times the capacity.

This is to save memory, it is recommended to use hashset rather than set/get way to use Redis, detailed methods see Reference (7). 6. Use aof instead of snapshot

Redis There are two ways to store, the default is the snapshot way, the implementation method is timed to the memory of the snapshot (snapshot) persisted to the hard disk, the disadvantage is that after the persistence of the crash will lose a piece of data. As a result of the perfectionist, the author adds a aof approach. AoF that append only mode, while writing the memory data to save the operation command to the log file, in a concurrent change of tens of thousands of systems, the command log is a very large data, management maintenance costs are very high, restore rebuild time will be very long, which leads to the loss of aof high availability intention. What's more, Redis is a memory data structure model, all of which are built on efficient atomic operations on complex memory data structures, which shows that aof is a very uncoordinated part.

In fact, the main purpose of AOF is data reliability and high availability, in Redis there is another way to achieve the goal: Replication. Because of the high performance of Redis, there is no delay in replication. This achieves the prevention of a single point of failure and the implementation of high availability. Summary

In order to successfully use a product, we need to understand its characteristics in depth. Redis performance is outstanding, if can skilled control, to domestic many large-scale application has the very big help. I hope more of our peers will join Redis in the use and code research. Reference Documents On designing and Deploying Internet-scale Service (PDF) Facebook's New real-time messaging system:hbase to Store 135+ Bill Ion Messages A Month What ' s wrong with 1975 programming Linux Epoll are now supported (Google Groups) CAS and why I don ' t W Ant to add it to Redis (Google Groups) plans for Virtual Memory (Google Groups) full of keys (Salvatore Antirez Sanfilippo)

-eof-

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.