A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
Redis is a high-performance, memory-based Key-value database.
Redis is essentially a key-value type of in-memory database, much like memcached, where the entire database is loaded in memory for operation, and the database data is periodically flush to the hard disk for storage by asynchronous operations . Because it is pure memory Operation , Redis has excellent performance and can handle more than 100,000 read and write operations per second, which is the fastest known performance Key-value DB.
The great feature of Redis is not only performance, but also the greatest charm of redis is to support the preservation of a variety of data Structures , and the maximum limit of a single value is 1GB, unlike memcached can only save 1MB of data, So Redis can be used to implement a lot of useful functions, such as using his list to do FIFO doubly linked list, to achieve a lightweight high-performance Message Queuing service , with his set can do high-performance tag system and so on. In addition, Redis can also set expire time for key-value, so it can also be used as a function-enhanced version of the memcached.
The main drawback of Redis is that database capacity is limited by physical memory and cannot be used as a high-performance read-write for massive data, so Redis is the most suitable scenario for high-performance operations and operations with smaller data volumes .
Redis is differentiated by Key-value's single-valued different types, and the following are the supported types:
Sets the intersection and the set
Specific instruction Description: Http://code.google.com/p/redis/wiki/CommandReference
Redis reads the data into memory for the fastest read-write speed and writes the data asynchronously to disk. So Redis has the characteristics of fast and data persistence. If you do not put the data in memory, the disk I/O speed severely affects the performance of Redis. With memory getting cheaper today, Redis will be more and more popular.
If you set the maximum amount of memory used, you cannot continue inserting new values after the data has reached the memory limit.
Also talk about how in-memory data is synchronized to disk
Redis is the fork process when it dumps data. Redis's default configuration, every 60 seconds if the number of record changes to 10,000 will need to dump to the hard drive, but in fact, because of the number exceeded, our redis almost constantly on the dump data to the hard disk, dump data to the hard disk, I estimate in order to achieve an atomic effect, To avoid data loss, Redis first dumps the data to a temporary file and then renames it to the data file name that you set in the profile. In front of that, loading data for 1-2 minutes, dump data should be about 1 minutes or so, dump out of the file almost 1 to 2 G; The server almost always keeps writing a 2G file per minute of this IO load, the disk basically not idle;
Redis leverages queue technology to turn concurrent access into serial access, eliminating the overhead of traditional database serial control
When your key is small and value is large, the effect of using a VM is better. Because this saves more memory.
When your key is not small, consider using some very important methods to turn a large key into a large value, such as you might consider combining key,value into a new value.
Vm-max-threads This parameter, you can set the number of threads to access the swap file, set the best not to exceed the machine's core number, if set to 0, then all the operation of the swap file is serial. This can result in a long delay, but with good assurance of data integrity.
The performance of virtual memory is also good when you test yourself. If you have a large amount of data, consider a distributed or other database
Redis supports master-slave mode. Principle: Master synchronizes data to slave, and slave does not synchronize the data to master. Slave is connected to master at startup to synchronize data.
This is a typical distributed read-write separation model . We can use master to insert data and slave to provide retrieval services . This can effectively reduce the number of concurrent accesses for a single machine
by increasing the number of slave db, the read performance can grow linearly . To avoid a single point of failure in Master DB, the cluster typically uses two master db for two-machine hot standby , so the read and write availability of the entire cluster is very high.
The flaw in the read-write separation architecture is that each node must hold the full data, whether it be master or slave, and if the scale of the cluster is limited to the storage capacity of a single node in the case of large data volumes, And for write-intensive types of applications, the read-write separation architecture is not appropriate.
In order to solve the defect of the read-write separation model, the data fragmentation model can be applied.
Each node can be viewed as a separate master, and then the data is fragmented through the business.
Combining the above two models, each master can be designed as a model consisting of a master and multiple slave.
This is the official data: set operation 110,000 times per second, get operation 81,000 times per second.
In the experiment, 20 clients were simulated to write to Redis. When the data in the database reaches the G data level, the write speed will be significantly reduced.
Possible causes: 1, Redis needs to synchronize data to disk, occupy a large amount of CPU and memory, 2, the number of keys to be re-layout, 3, there are a large number of requests in Message Queuing, causing the request to block.
Here is a small example of a Redis-based message queue.
Two. Architecture design of distributed cache
1. Architecture Design
Because Redis is a single point, the project needs to be used and distributed on its own. The basic architecture diagram looks like this:
1. What is Redis
The results of this question affect how we use Redis. If you think Redis is a key value store, then you might use it instead of MySQL; if you think it is a persistent cache, it may just be some temporary data that is accessed frequently. Redis is the abbreviation of REMOte DIctionary Server. The subtitle of Redis on the official website is A persistent key-value database with built-in net interface written in ANSI-C for Posix systems. This definition is biased towards key value store. Others think that Redis is a memory database because its high performance is based on memory operations. Others think that Redis is a data structure server, because Redis supports complex data features, such as List, Set, etc. Different interpretations of the role of Redis determine how you use Redis.
Internet data is currently basically stored in two ways, relational databases or key values. However, these Internet services do not belong to these two types of data. For example, the relationship between users on social platforms is a list. If you want to use a relational database to store them, you need to convert them into a multi-line record form. This form exists A lot of redundant data, each line needs to store some duplicate information. If the key value is used for storage, modification and deletion are troublesome, and all data needs to be read and written. Redis has designed various data types in memory to allow businesses to atomically access these data structures at high speed, and does not need to care about the problem of persistent storage. It has solved the problems of the previous two types of storage that require some detours.
2. Redis cannot be faster than Memcache
Many developers think that Redis cannot be faster than Memcached. Memcached is completely memory-based, and Redis has persistent storage features. Even asynchronously, Redis cannot be faster than Memcached. But the test results basically show that Redis has the absolute advantage. I have been thinking about this cause, and there are several reasons for the current thought.
Libevent. Unlike Memcached, Redis did not choose libevent. In order to meet the generality, Libevent caused a huge code (the Redis code is currently less than 1/3 of libevent) and sacrificed a lot of performance on specific platforms. Redis implemented its own epoll event loop (4) with two file modifications in libevent. Many developers in the industry also recommend that Redis use another libevent high-performance alternative to libev, but the author still insists that Redis should be small and rely on the idea. An impressive detail is that you do not need to execute ./configure before compiling Redis.
CAS problem. CAS is a convenient method in Memcached to prevent contention from modifying resources. The CAS implementation needs to set a hidden cas token for each cache key, and cas is the value version number. Each set of tokens needs to be incremented, so it brings the double overhead of CPU and memory. Although these overheads are small, but to a single 10G + cache And after QPS tens of thousands of these costs will bring some slight performance differences between the two sides (5).
3. The data stored in a single Redis must be smaller than the physical memory
Redis data all in memory brings high-speed performance, but also brings some unreasonable points. For example, a medium-sized website has 1 million registered users. If these materials are to be stored in Redis, the memory capacity must be able to accommodate these 1 million users. However, the actual situation of the business is that there are only 50,000 active users for 1 million users and only 150,000 users who have visited once a week. Therefore, the data of all 1 million users is unreasonable in the memory. Pay the bill.
This is very similar to the operating system. The data accessed by all applications of the operating system is in memory, but if the physical memory cannot hold the new data, the operating system will intelligently exchange some of the data that has not been accessed for a long time to disk, leaving space for new applications. . What modern operating systems provide to applications is not physical memory, but the concept of virtual memory.
Based on the same considerations, Redis 2.0 also adds VM features. Let Redis data capacity exceed the limits of physical memory. And realize the cold and hot data separation.
4. Redis' VM implementation is rebuilding the wheel
Redis's VM is still implemented by itself according to the previous epoll implementation ideas. However, in the introduction of the previous operating system, it was mentioned that the OS can also help the program to separate hot and cold data. Redis only needs the OS to apply for a large memory. The OS will automatically put the hot data into the physical memory and exchange the cold data to the hard disk. Another well-known Varnish's "Understanding Modern Operating System (3)" is how to achieve this, and has achieved very successful results.
The author antirez mentions several reasons in explaining why to implement the VM yourself (6). The VM swap-in and swap-out of the main OS is based on the concept of Page. For example, OS VM 1 Page is 4K. As long as there is one element in 4K, even if only 1 byte is accessed, this page will not be SWAP. The same is true for swap-in. , Reading a byte may be swapped into 4K useless memory. Redis' own implementation can achieve the granularity of controlling the swap-in. In addition, the block process when accessing the operating system SWAP memory area is also one of the reasons that led Redis to implement the VM itself.
5. Use Redis with get / set
As a key value exists, many developers naturally use Redis using set / get methods. In fact, this is not the optimal method. Especially when the VM is not enabled, all Redis data needs to be put into memory, saving memory is especially important.
If a key-value unit requires a minimum of 512 bytes, even if only one byte is stored, it takes up 512 bytes. At this time, there is a design pattern where keys can be reused, several key-values are put into a key, and the value is stored as a set, so that the same 512 bytes will store 10-100 times the capacity.
This is to save memory, it is recommended to use hashset instead of set / get to use Redis. For details, see Reference (7).
6. Use aof instead of snapshot
Redis has two storage methods. The default is the snapshot method. The implementation method is to periodically persist the memory snapshots to the hard disk. The disadvantage of this method is that if a crash occurs after the persistence, a piece of data will be lost. Therefore, under the impetus of perfectionists, the author increased the aof method. aof is append only mode, which saves operation commands to log files while writing memory data. In a system that changes tens of thousands concurrently, the command log is a very large amount of data. Management and maintenance costs are very high. Very long, which leads to the loss of aof high availability intention. What's more important is that Redis is a memory data structure model. All the advantages are based on efficient atomic operations on complex memory data structures. This shows that aof is a very uncoordinated part.
In fact, the purpose of aof is mainly data reliability and high availability. There is another way to achieve the goal in Redis: Replication. Due to the high performance of Redis, there is almost no delay in replication. This achieves protection against single points of failure and achieves high availability.
To successfully use a product, we need to understand its characteristics. Redis has outstanding performance. If it can master it skillfully, it will be of great help to many large domestic applications.
Memcache official definition
Free & Open Source, High-performance, distributed memory object caching system, generic in nature, and intended for us E in speeding to dynamic Web applications by alleviating database load.
Redis official definition
Redis is a open source, BSD licensed, Advanced Key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.
They are all used by the BSD protocol, using its projects can be used for business users, without having to post two modified code, you can modify the source code.
Redis data type rich , support set Liset and other types
Memcache supports simple data types and requires clients to handle complex objects themselves
Redis supports data-on-ground persistent storage
Memcache does not support data persistent storage
Redis supports Master-slave replication mode
Memcache can be distributed using consistent hash
Value varies in size
Memcache is a memory cache, the key is less than 250 characters in length, and a single item store is less than 1M, not suitable for use by a virtual machine
Data consistency is different
Redis uses a single-threaded model to ensure that the data is submitted sequentially.
Memcache need to use CAS to ensure data consistency. CAS (Check and Set) is a mechanism for ensuring concurrency consistency and belongs to the "optimistic lock" category; The principle is simple: take the version number, operation, contrast version number, if the same operation, inconsistent will discard any operation.
The Redis single-threaded model can use only one CPU to open multiple Redis processes
GitHub version Address: Https://github.com/cncounter/translation/blob/master/tiemao_2014/Redis_beats_Memcached/Redis_beats_ Memcached.md
Memcached or Redis? This has been a controversial topic in modern high-performance Web applications. When Web applications that are based on relational databases need to improve performance, using caching is the first choice for the vast majority of architects, and naturally, memcached and redis are often preferred.
Memcached was originally developed by Brad Fitzpatrick for the LiveJournal website in 2003. And then rewrite it again in C (the first edition for Perl) and open to the public, which has become the cornerstone of modern web system development. The current development direction of memcached is to improve stability and performance optimization, rather than adding new feature features.
Redis was created in 2009 by Salvatore Sanfilippo, and until today Sanfilippo remains the only developer and code maintainer of Redis. It's also not surprising that Redis is also known as the "Memcached enhanced version (Memcached on steroids)", because part of Redis is built on the experience summary of Memcached. Redis has more features than memcached, which makes it more flexible, more powerful and more complex.
Memcached and Redis are used by many enterprises and a large number of production systems, supporting a variety of language development of the client, has a rich SDK. In fact, in the Internet Web development language of the dot scale, there is basically no support for memcached or Redis.
Why are memcached and Redis so popular? Not only is it extremely high performance, but also because they are relatively simple. It's pretty easy for programmers to get started using memcached or Redis. It may take only a few minutes to install and set up and integrate into the system. So it takes a little bit of time and effort to boost system performance immediately-usually by an order of magnitude. A simple solution can make a huge performance benefit: It's just too sour to imagine.
Because Redis is an emerging solution that offers more features, Redis is generally a better choice than memcached. Memcached may be a better choice in two specific scenarios.
The first is very finely-grained static data, such as HTML code snippets. Memcached's memory management is not as complex as Redis, so performance is higher because memcached's meta-data metadata is smaller, with relatively little additional overhead. The only supported data type for memcached is the string, which isStringideal for caching read-only data because the string does not require additional processing.
The second scenario is that memcached is easier to scale horizontally than redis. The reason for this is that it is simple to design and function, and memcached easier to extend. The message shows that Redis will have built-in reliable cluster support [but has been skipping tickets] in the upcoming version 3.0 (read CA release notes).
You should prefer Redis unless you are under environmental constraints (such as legacy systems), or if your business complies with the 2 above scenarios. Using Redis as a cache, system efficiency can be greatly improved by tuning cache content.
It is clear that the advantage of Redis is cache management. The cache will make room for new data by removing the old data from memory, if necessary, by some kind of data eviction mechanism. Memcached's data recovery mechanism uses the LRU (Least recently used, least recently used) algorithm while prioritizing old blocks of data that are about the same size as new data. In contrast, Redis allows fine-grained control over expired caches, with 6 different strategies to choose from. Redis also employs a number of more complex memory management methods and recycling strategies.
Redis provides greater flexibility for cached objects. While the memcached limit is 250 bytes, the value is limited to 1 MB and can only be communicated through a plain text string. The key and value size limits for Redis are all megabytes (MB), which is binary safe "without data loss, regardless of encoding". With 6 data types, Redis makes caching and managing caches smarter and easier, opening up an infinite world for application developers.
Redis stores the fields and values of an object with a hash, and can be managed by a single key, compared to storing the object in string format.
See what you need to update an object with memcached:
And every update is going to do these things.
The use of Redis hashing can significantly reduce resource consumption and improve performance. Other Redis data types, such as list or Set, can be used to implement more complex cache management patterns.
Another significant advantage of Redis is that the data it stores is opaque, which means that the data can be manipulated directly on the server side. Most of the more than 160 commands can be used for data manipulation, so processing data through a server-side script call is a reality. These built-in commands and user scripts allow you to handle data tasks directly and flexibly without having to transfer data to another system over the network for processing.
Redis provides optional/adjustable data persistence to quickly load the cache after a crash/restart. Although we generally believe that the data in the cache is unstable and instantaneous, it is valuable to persist the data to disk in the cache system. Loading the warm-up immediately after a reboot takes a short time and reduces the overhead of the primary database system.
Finally, Redis provides master-slave replication (replication). The Replication can be used to implement a highly available cache system that allows for uninterrupted service in the event of some server downtime. Assuming that a cache server crashes, only a small number of users and programs are affected in a short time, and in most cases, there is a proven solution to ensure the availability of cached content and services.
Today's open source software is always providing the best practical technology solutions. When you need to use caching to improve application performance, Redis and memcached are the best product-level solutions. But given its rich features and advanced design, the vast majority of the time Redis should be your first choice.
Author's profile: Itamar Haber (@itamarhaber) is the chief developer of Redis Labs, which provides fully managed memcached and Redis cloud services for developers. With years of experience in software product development, Xeround, Etagon, Amicada, and M.N.S Ltd. as management and leadership positions. Itamar get Northwestern and Tel-aviv universitiesd Ogg-recanati, Master of Business Administration, and science in computer BS.
Original link: Why Redis beats Memcached for caching
Original Date: 2014-10-15
Translation Date: 2014-10-23
About Redis & the difference from Memcache
Start building with 50+ products and up to 12 months usage for Elastic Compute Service