Redis's author, Salvatore Sanfilippo, once compared these two types of memory-based data storage systems:
Redis supports server-side data operations: Redis has more data structures and supports richer data operations than Memcached, usually in Memcached, where you need to get the data to the client for similar modifications and set back. This greatly increases the number of network IO and the volume of data. In Redis, these complex operations are often as efficient as the general get/set. So, if caching is needed to support more complex structures and operations, then Redis would be a good choice.
Memory efficiency comparisons: With simple key-value storage, Memcached memory utilization is higher, and if Redis uses a hash structure to do key-value storage, its memory utilization will be higher than Memcached due to its combined compression.
Performance comparison: Since Redis only uses a single core, and Memcached can use multi-core, the average redis on each kernel is higher than the Memcached performance when storing small data. In more than 100k of data, Memcached performance is higher than Redis, although Redis recently in the storage of large data performance optimization, but compared to Memcached, or slightly inferior.
The specific reasons for the above conclusions, the following information gathered:
1, data type support different
Unlike Memcached, which supports only simple key-value structures, Redis supports a much richer range of data types. The most commonly used data types are five kinds: String, Hash, List, set, and Sorted set. Redis internally uses a Redisobject object to represent all key and value. The main message of Redisobject is as shown in the picture:
Type represents the specific data type of a value object, and encoding is the way in which different data types are stored within Redis, for example: Type=string represents a normal string for value, then the corresponding encoding can be raw or int, if int, represents the actual Redis the interior is stored and represented by a numeric class, assuming that the string itself can be represented numerically, such as "123″ 456". The VM Field field does not really allocate memory until the Redis virtual Memory feature is turned on, and the feature is turned off by default.
1) String
Common commands: Set/get/decr/incr/mget, etc.;
Scenario: String is the most commonly used type of data, and ordinary key/value storage can be classified as such;
Implementation: string Redis internal storage default is a string, referenced by Redisobject, when encountered Incr, DECR and other operations will be converted to a numeric calculation, at this time redisobject encoding field Int.
2) Hash
Common commands: Hget/hset/hgetall, etc.
Application scenario: We want to store a user information object data, including user ID, user name, age, and birthday, through the user ID we want to get the user's name or age or birthday;
Implementation: Redis's Hash is actually an internal store Value of HashMap, and provides an interface to directly access this MAP member. As the figure shows, the Key is the user ID, and value is a map. The key of this Map is the property name of the member, and value is the property value. This data can be modified and accessed directly through its internal map key (Redis called internal map key field), that is, through the key (user ID) + field (attribute tag) can manipulate the corresponding attribute data. There are two ways to implement the current HashMap: When HASHMAP members are relatively young Redis to save memory by using a one-dimensional array of compact storage, rather than using a true HASHMAP structure, the redisobject of the corresponding value is Encodi Ng is Zipmap, and when the number of members increases, it automatically turns into a real HashMap, at which point encoding is HT.
3) List
Common commands: Lpush/rpush/lpop/rpop/lrange, etc.;
Application Scenario: Redis list has a lot of applications and is one of the most important data structures of redis, such as Twitter's attention list, fan list and so on can be realized with Redis list structure;
Implementation: Redis list is implemented as a two-way linked list, that can support reverse lookup and traversal, more convenient operation, but brought some additional memory overhead, many implementations of Redis, including sending buffer queues are also used in this data structure.
4) Set
Common commands: sadd/spop/smembers/sunion, etc.;
Application Scenario: Redis set provides functionality that is similar to a list of functions, the special thing is that set is automatic, and when you need to store a list of data and do not want duplicate data, set is a good choice, and set provides a way to determine whether a member is in a An important interface within a set set, which is not provided by the list;
Implementation: The internal implementation of set is a value is always null HASHMAP, in fact, by calculating the hash of the way to quickly row weight, which is also set can provide to determine whether a member is within the set of reasons.
5) Sorted Set
Common commands: Zadd/zrange/zrem/zcard, etc.;
Scenario: Redis Sorted Set's use scenario is similar to set, except that set is not automatically ordered, and sorted set can sort members by providing an additional priority (score) parameter for the user, and is inserted in an orderly, automatic sort. When you need an ordered and not duplicated list of collections, you can choose to sorted set data structures, such as Twitter's public timeline can be stored as score by publication time, which is automatically sorted by time.
Implementation: Redis sorted set of the internal use of HASHMAP and jump Table (skiplist) to ensure the storage and orderly data, HashMap in the member to the score of the map, and the jump tables are stored in all the members, sorted by the HashMap Score, the use of jump table structure can achieve a higher search efficiency, and in the implementation is relatively simple.
2, the memory management mechanism is different
In Redis, not all data is stored in memory. This is one of the biggest differences compared with Memcached. When physical memory is exhausted, Redis can swap out some of the value of a long time to disk. Redis will only cache all key information, if Redis found that memory usage exceeds a certain threshold, will trigger the operation of swap, Redis according to "swappability = Age*log (size_in_memory)" calculated which key pair The value that is expected will need to be swap to disk. The value corresponding to these keys is then persisted to disk and purged in memory. This feature allows Redis to maintain data that exceeds the memory size of its machine itself. Of course, the memory of the machine itself must be able to maintain all key, after all, the data is not swap operation. At the same time, because Redis will swap the data in memory to disk, the main thread of the service and the child thread that carries out the swap share this part of the memory, so if you update the data that requires swap, REDIS will block the operation until the child thread completes the swap operation before it can be modified. When reading data from Redis, if the value of the read key is not in memory, then Redis will need to load the corresponding data from the swap file and return it to the requester. There is a problem with the I/O thread pool. By default, Redis will be blocked, that is, all the swap files will be loaded before the corresponding. This strategy is relatively small in number of clients and is appropriate for bulk operations. But if you apply Redis to a large web site application, this is obviously not a big concurrency scenario. So Redis run we set the size of the I/O thread pool to perform concurrent operations on the read requests that need to load the corresponding data from the swap file, reducing the blocking time.
For memory-based database systems such as Redis and Memcached, the efficiency of memory management is a key factor affecting system performance. The Malloc/free function in traditional C language is the most commonly used method of allocating and releasing memory, but this method has a great flaw: first of all, the mismatch of malloc and free for developers is likely to cause memory leaks, followed by frequent calls can cause a lot of memory fragmentation can not be recycled. Reduced memory utilization; Finally, as a system call, the system overhead is much greater than the normal function call. Therefore, in order to improve the efficiency of memory management, efficient memory management scheme will not directly use the Malloc/free call. Redis and Memcached both use their own design of memory management mechanism, but the implementation of the method is very different, the following will be the memory management mechanism for each of the introduction.
Memcached uses the slab allocation mechanism to manage memory by default, and its main idea is to split the allocated memory into a specific length block to store the corresponding length of the Key-value data record in a predetermined size to completely resolve the memory fragmentation problem. The slab allocation mechanism is designed only for storing external data, which means that all key-value data is stored in the slab allocation system, while Memcached other memory requests are applied through the ordinary malloc/free, because these The number and frequency of requests determine that they do not affect the performance of the entire system slab allocation principle is fairly straightforward. As the figure shows, it first requests a chunk of memory from the operating system and divides it into block Chunk of various sizes and blocks the same size into group slab Class. Among them, Chunk is the smallest unit used to store key-value data. The size of each slab Class can be controlled by setting up growth Factor when the Memcached is started. Assuming that the value of growth Factor in the graph is 1.25, if the first group of Chunk is 88 bytes, the second Chunk size is 112 bytes, and so on.
When the Memcached receives the data sent by the client, it first selects the most appropriate slab class based on the size of the data received, and then queries Memcached the list of free Chunk in the slab Class to find a C that can be used to store the data. Hunk When a database expires or is discarded, the Chunk that the record occupies can be recycled and added back to the free list.
From the above process we can see that Memcached memory management system is high efficiency, and does not cause memory fragmentation, but its biggest drawback is that it can lead to space waste. Because each Chunk allocates a specific length of memory space, variable length data does not fully utilize the space. As the figure shows, 100 bytes of data are cached into 128-byte Chunk, and the remaining 28 bytes are wasted.
Redis memory management mainly through the source zmalloc.h and zmalloc.c two files to achieve. Redis in order to facilitate the management of memory, after allocating a piece of memory, it will be the size of the memory block to the head of memory. As shown in the figure, Real_ptr is the pointer returned after Redis calls malloc. Redis the size of the memory block to the head, size occupies a known amount of memory, is the length of the size_t type, and then returns RET_PTR. When memory needs to be freed, Ret_ptr is passed to the memory management program. By Ret_ptr, the program can easily calculate the real_ptr value, and then pass real_ptr to free to release the memory.
Redis by defining an array to record all memory allocations, the length of this array is zmalloc_max_alloc_stat. Each element of the array represents the number of memory blocks allocated by the current program, and the size of the memory block is the subscript for that element. In the source code, this array is zmalloc_allocations. ZMALLOC_ALLOCATIONS[16] represents the number of memory blocks that have been allocated lengths of 16bytes. ZMALLOC.C has a static variable used_memory used to record the total amount of memory currently allocated. Therefore, in general, Redis adopts the packaging of mallc/free, compared to the Memcached of memory management method, it is much simpler.
3. Support of data persistence
Redis is a memory-based storage system, but it itself supports the persistence of memory data and provides two major persistence strategies: RDB snapshots and aof logs. The memcached does not support data persistence operations.
1) RDB Snapshot
Redis supports the persistence mechanism of saving snapshots of the current data into a data file, that is, a RDB snapshot. But how does a database that continues to write generate snapshots? Redis with the copy on write mechanism of the fork command. When the snapshot is generated, fork the current process out of a subprocess, and then loops through all the data in the subprocess, writing the data to the RDB file. We can configure the timing of the RDB snapshot generation with the Redis save instruction, for example, configure 10 minutes to generate snapshots, or you can configure 1000 writes to generate snapshots, or multiple rules to implement them. These rules are defined in the Redis configuration file, and you can also set the rules at Redis run time using the Redis CONFIG set command without restarting the Redis.
Redis's RDB file won't break, because the write operation is done in a new process, when a new RDB file is generated, the Redis-generated subprocess writes the data to a temporary file and then renames the temporary file to the RDB file through an atomic rename system call, In this way, Redis RDB files are always available when a failure occurs at any time. At the same time, Redis's RDB file is also a link in the internal implementation of Redis master-slave synchronization. RDB has his shortcoming, is once the database has the problem, then our RDB file saves the data is not brand-new, from the last RDB file generation to Redis downtime this period of time data all lost. This can be tolerated in some businesses.
2) aof Log
The full name of the AOF log is append only file, which is an append-write log file. Unlike the binlog of a general database, the AoF file is an identifiable plain text, and its content is a Redis standard command. Only commands that cause data changes are appended to the AoF file. Each command to modify the data generates a log, aof file will become larger, so Redis also provides a function, called aof rewrite. Its function is to regenerate a copy of the AoF file, a record in the new aof file only once, rather than as an old file, may record multiple operations on the same value. The generation process is similar to RDB, and is also a fork process that traverses data directly and writes new aof temporary files. In the process of writing a new file, all write logs will still be written to the old aof file and recorded in the memory buffer. When the redo operation completes, the log in all buffers is written to the temporary file at once. The Atomic Rename command is then invoked to replace the old aof file with the new AoF file.
AoF is a write-file operation, which is designed to write the action log to disk, so it will also encounter the write process described above. After aof call write write in Redis, the Appendfsync option is used to control the time that the call Fsync writes to disk, and the security intensity of the three settings below Appendfsync is gradually stronger.
Appendfsync No when the Appendfsync is set to No, Redis does not actively call Fsync to sync the AOF log contents to disk, so it all depends on the debugging of the operating system. For most Linux operating systems, a fsync is performed every 30 seconds to write data from the buffer to disk.
Appendfsync Everysec When the Appendfsync is set to Everysec, Redis makes a fsync call every second and writes the data from the buffer to disk. But when the fsync call of this time is longer than 1 seconds. Redis will take a delayed fsync strategy and wait another second. That is, in two seconds after the Fsync, this time the Fsync no matter how long it will be carried out. This is because the file descriptor is blocked at Fsync, so the current write operation is blocked. So the conclusion is that, in most cases, Redis will perform a fsync every second. In the worst case, a fsync operation is performed in two seconds. This operation is referred to as group commit in most database systems, that is, data that combines multiple writes and writes the log to disk one at a time.
Appednfsync always when the appendfsync is set to always, each write operation invokes a Fsync, at which point the data is the safest and, of course, its performance is affected by the execution of Fsync each time.
For general business requirements, it is recommended to use a RDB approach for persistence, because RDB costs are much lower than aof logs and it is recommended that you use the AOF log for applications that cannot tolerate data loss.
4, cluster management of the different
Memcached is a full memory data buffering system, while Redis supports data persistence, but full memory is the essence of high performance after all. As a memory-based storage system, the size of the machine's physical memory is the maximum amount of data the system can hold. If the amount of data to be processed exceeds the physical memory size of a single machine, a distributed cluster needs to be built to extend the storage capacity.
The Memcached itself does not support distributed, so distributed Memcached can only be implemented on the client side through a distributed algorithm such as a consistent hash. The following figure shows the Memcached distributed storage implementation architecture. Before the client sends data to the Memcached cluster, it first calculates the target node of the data through the built-in distributed algorithm, and then the data is sent directly to the node for storage. However, when the client queries the data, it also calculates the node where the query data resides, and then sends a query request directly to the node to get the data.