Redis and memcached The difference between the detailed _redis

Source: Internet
Author: User
Tags data structures memcached memory usage redis set set redis cluster

Redis's author, Salvatore Sanfilippo, once compared these two types of memory-based data storage systems:

1.Redis supports server-side data operations: Redis has more data structures and supports richer data operations than memcached, usually in memcached, where you need to get the data to the client for similar modifications and set back. This greatly increases the number of network IO and the volume of data. In Redis, these complex operations are often as efficient as the general get/set. So, if caching is needed to support more complex structures and operations, then Redis would be a good choice.

2. Memory efficiency comparison: The use of simple key-value storage, memcached memory utilization is higher, and if the redis using a hash structure to do key-value storage, because of its combined compression, its memory utilization will be higher than the memcached.

3. Performance comparison: Because Redis only use single core, and memcached can use multi-core, so on average each kernel Redis in storing small data is higher than memcached performance. In more than 100k of data, memcached performance is higher than Redis, although Redis recently in the storage of large data performance optimization, but compared to memcached, or slightly inferior.

The specific reasons for the above conclusions, the following information gathered:

1, data type support different

Unlike memcached, which supports only simple key-value structures, Redis supports a much richer range of data types. The most commonly used data types are five kinds: String, Hash, List, set, and sorted set. Redis internally uses a Redisobject object to represent all key and value. The main message of Redisobject is as shown in the picture:

Type represents the specific data type of a value object, and encoding is the way in which different data types are stored within Redis, for example: Type=string represents a normal string for value. Then the corresponding encoding can be raw or int, and if int represents the actual Redis, the interior is stored and represented by a numeric class, assuming that the string itself can be represented numerically, such as "123″" 456 "strings." The VM Field field does not really allocate memory until the Redis virtual Memory feature is turned on, and the feature is turned off by default.

1) String

Common commands:set/get/decr/incr/mget, etc.;
Application Scenario: String is the most commonly used type of data, and ordinary key/value storage can be classified as such;
Implementation mode: string in Redis internal storage default is a string, referenced by Redisobject, when encountered Incr, DECR and other operations will be converted to a numeric calculation, at this time redisobject encoding field Int.

2) Hash

Common commands:Hget/hset/hgetall, etc.
Application Scenario: We want to store a user information object data, including user ID, user name, age, and birthday, through the user ID we want to get the user's name or age or birthday;
Implementation mode: The Redis hash is actually an internal store value of HashMap and provides an interface to directly access this map member. As the figure shows, the key is the user ID, and value is a map. The key of this map is the property name of the member, and value is the property value. This data can be modified and accessed directly through its internal map key (Redis called internal map key field), that is, through the key (user ID) + field (attribute tag) can manipulate the corresponding attribute data. There are two ways to implement the current HashMap: When HashMap members are relatively young Redis save memory by using a one-dimensional array to compact storage without a true HASHMAP structure. At this point, the corresponding value of the Redisobject encoding for Zipmap, when the number of members will automatically turn into a real hashmap, at this time encoding for HT.

3) List

Common commands:lpush/rpush/lpop/rpop/lrange, etc.;
Application Scenario: Redis list has a lot of applications and is one of the most important data structures in Redis, such as Twitter's attention list, fan list and so on can be implemented with REDIS list structure;
Implementation mode: Redis list is implemented as a two-way linked list, which can support reverse lookup and traversal, more convenient operation, but brings some extra memory overhead, many implementations within Redis, including sending buffer queues are also used in this data structure.

4) Set

Common commands:sadd/spop/smembers/sunion, etc.;
Application Scenario: Redis set provides a function that is similar to a list, especially if the set is automatic, and when you need to store a list of data and do not want duplicate data, set is a good choice, and set provides an important interface for determining whether a member is within a set set, which is not provided by the list;
implementation:The internal implementation of set is a value is always null HashMap, in fact, by calculating the hash of the way to quickly row weight, which is also set can provide to determine whether a member is within the set of reasons.

5) Sorted Set

Common commands:zadd/zrange/zrem/zcard, etc.;
Application Scenario: The use scenario for the Redis sorted set is similar to set, except that the set is not automatically ordered, and sorted set can be sorted by the user providing an additional priority (score) parameter, and is inserted in an orderly, automatic sort. When you need an ordered and not duplicated list of collections, you can choose to sorted set data structures, such as Twitter's public timeline can be stored as score by publication time, which is automatically sorted by time.
Implementation mode: Redis sorted set of the internal use of HashMap and jump Table (skiplist) to ensure the storage and order of data, HashMap is a member to the score of the map, and the jump tables are stored in all the members, sorted according to the HashMap in the score , the use of jump table structure can obtain a relatively high search efficiency, and in the implementation is relatively simple.

2, the memory management mechanism is different

In Redis, not all data is stored in memory. This is one of the biggest differences compared with memcached. When physical memory is exhausted, Redis can swap out some of the value of a long time to disk. Redis only caches all key information, and if Redis finds that memory usage exceeds a certain threshold, the operation of swap will be triggered, Redis according to "swappability = Age*log (size_in_memory)" calculates which key corresponding value needs to be swap to disk. The value corresponding to these keys is then persisted to disk and purged in memory. This feature allows Redis to maintain data that exceeds the memory size of its machine itself. Of course, the memory of the machine itself must be able to maintain all key, after all, the data is not swap operation. At the same time, because Redis will swap the data in memory to disk, the main thread of the service and the child thread that carries out the swap share this part of the memory, so if you update the data that requires swap, REDIS will block the operation until the child thread completes the swap operation before it can be modified. When reading data from Redis, if the value of the read key is not in memory, then Redis will need to load the corresponding data from the swap file and return it to the requester. There is a problem with the I/O thread pool. By default, Redis will be blocked, that is, all the swap files will be loaded before the corresponding. This strategy is relatively small in number of clients and is appropriate for bulk operations. But if you apply Redis to a large web site application, this is obviously not a big concurrency scenario. So Redis run we set the size of the I/O thread pool to perform concurrent operations on the read requests that need to load the corresponding data from the swap file, reducing the blocking time.

For memory-based database systems such as Redis and memcached, the efficiency of memory management is a key factor affecting system performance. The Malloc/free function in traditional C language is the most commonly used method of allocating and releasing memory, but this method has a great flaw: first of all, the mismatch of malloc and free for developers is likely to cause memory leaks, followed by frequent calls can cause a lot of memory fragmentation can not be recycled. Reduced memory utilization; Finally, as a system call, the system overhead is much greater than the normal function call. Therefore, in order to improve the efficiency of memory management, efficient memory management scheme will not directly use the Malloc/free call. Redis and memcached both use their own design of memory management mechanism, but the implementation of the method is very different, the following will be the memory management mechanism for each of the introduction.

Memcached uses the slab allocation mechanism to manage memory by default, and its main idea is to split the allocated memory into a specific length block to store the corresponding length of the Key-value data record in a predetermined size to completely resolve the memory fragmentation problem. The slab allocation mechanism is designed only for storing external data, which means that all key-value data is stored in the slab allocation system, while memcached other memory requests are applied through the normal malloc/free. Because the number and frequency of these requests determines that they do not affect the performance of the entire system slab allocation principle is fairly straightforward. As the figure shows, it first requests a chunk of memory from the operating system and divides it into block chunk of various sizes and blocks the same size into group slab Class. Among them, chunk is the smallest unit used to store key-value data. The size of each slab class can be controlled by setting up growth factor when the memcached is started. Assuming that the value of growth factor in the graph is 1.25, if the first group of Chunk is 88 bytes, the second chunk size is 112 bytes, and so on.

When the memcached receives the data sent by the client, it first selects the most appropriate slab Class based on the size of the data received, and then queries memcached the Slab A list of idle chunk within class can find a chunk that can be used to store data. When a database expires or is discarded, the chunk that the record occupies can be recycled and added back to the free list. From the above process we can see that memcached memory management system is high efficiency, and does not cause memory fragmentation, but its biggest drawback is that it can lead to space waste. Because each chunk allocates a specific length of memory space, variable length data does not fully utilize the space. As the figure shows, 100 bytes of data are cached into 128-byte chunk, and the remaining 28 bytes are wasted.

Redis memory management mainly through the source zmalloc.h and zmalloc.c two files to achieve. Redis in order to facilitate the management of memory, after allocating a piece of memory, it will be the size of the memory block to the head of memory. As shown in the figure, Real_ptr is the pointer returned after Redis calls malloc. Redis the size of the memory block to the head, size occupies a known amount of memory, is the length of the size_t type, and then returns RET_PTR. When memory needs to be freed, Ret_ptr is passed to the memory management program. By Ret_ptr, the program can easily calculate the real_ptr value, and then pass real_ptr to free to release the memory.

3. Support of data persistence

Redis is a memory-based storage system, but it itself supports the persistence of memory data and provides two major persistence strategies: RDB Snapshots and aof logs. The memcached does not support data persistence operations.

1) Rdb Snapshot

Redis supports the persistence mechanism of saving snapshots of the current data into a data file, that is, a rdb snapshot. But how does a database that continues to write generate snapshots? Redis with the copy on write mechanism of the fork command. When the snapshot is generated, fork the current process out of a subprocess, and then loops through all the data in the subprocess, writing the data to the Rdb file. We can configure the timing of the RDB snapshot generation with the Redis save instruction, for example, configure 10 minutes to generate snapshots, or you can configure 1000 writes to generate snapshots, or multiple rules to implement them. These rules are defined in the Redis configuration file, and you can also set the rules at Redis run time using the Redis config set command without restarting the Redis.
Redis's Rdb file won't break, because the write operation is done in a new process, when a new Rdb file is generated, the Redis-generated subprocess writes the data to a temporary file and then renames the temporary file to the Rdb file through an atomic rename system call, In this way, Redis RDB files are always available when a failure occurs at any time. At the same time, Redis's Rdb file is also a link in the internal implementation of Redis master-slave synchronization. Rdb has his shortcoming, is once the database has the problem, then our RDB file saves the data is not brand-new, from the last Rdb file generation to Redis downtime this period of time data all lost. This can be tolerated in some businesses.

2) aof Log

The full name of the AOF log is append only file, which is an append-write log file. Unlike the binlog of a general database, the AoF file is an identifiable plain text, and its content is a Redis standard command. Only commands that cause data changes are appended to the AoF file. Each command to modify the data generates a log, aof file will become larger, so Redis also provides a function, called aof rewrite. Its function is to regenerate a copy of the AoF file, a record in the new aof file only once, rather than as an old file, may record multiple operations on the same value. The generation process is similar to RDB, and is also a fork process that traverses data directly and writes new aof temporary files. In the process of writing a new file, all write logs will still be written to the old aof file and recorded in the memory buffer. When the redo operation completes, the log in all buffers is written to the temporary file at once. The Atomic Rename command is then invoked to replace the old aof file with the new AoF file.
AoF is a write-file operation, which is designed to write the action log to disk, so it will also encounter the write process described above. After aof call write write in Redis, the Appendfsync option is used to control the time that the call Fsync writes to disk, and the security intensity of the three settings below Appendfsync is gradually stronger.

1.appendfsync No when setting Appendfsync to No, Redis does not actively invoke Fsync to sync aof log content to disk, so it all depends on the debugging of the operating system. For most Linux operating systems, a fsync is performed every 30 seconds to write data from the buffer to disk.

Appendfsync Everysec When the Appendfsync is set to Everysec, Redis makes a fsync call every second and writes the data from the buffer to disk. But when the fsync call of this time is longer than 1 seconds. Redis will take a delayed fsync strategy and wait another second. That is, in two seconds after 2.fsync, this time the Fsync will be performed regardless of how long it would take. This is because the file descriptor is blocked at Fsync, so the current write operation is blocked. So the conclusion is that, in most cases, Redis will perform a fsync every second. In the worst case, a fsync operation is performed in two seconds. This operation is referred to as group commit in most database systems, that is, data that combines multiple writes and writes the log to disk one at a time.

3.appednfsync always when setting Appendfsync to always, each write operation will call once Fsync, the data is the safest, of course, because the Fsync will be executed every time, so its performance will be affected.

For general business requirements, it is recommended to use a RDB approach for persistence, because RDB costs are much lower than aof logs and it is recommended that you use the AOF log for applications that cannot tolerate data loss.

4, cluster management of the different

Memcached is a full memory data buffering system, while Redis supports data persistence, but full memory is the essence of high performance after all. As a memory-based storage system, the size of the machine's physical memory is the maximum amount of data the system can hold. If the amount of data to be processed exceeds the physical memory size of a single machine, a distributed cluster needs to be built to extend the storage capacity.

The memcached itself does not support distributed, so distributed memcached can only be implemented on the client side through a distributed algorithm such as a consistent hash. The following figure shows the memcached distributed storage implementation architecture. Before the client sends data to the memcached cluster, it first calculates the target node of the data through the built-in distributed algorithm, and then the data is sent directly to the node for storage. However, when the client queries the data, it also calculates the node where the query data resides, and then sends a query request directly to the node to get the data.

Compared to memcached can only use the client to implement distributed storage, Redis is more inclined to build distributed storage on the server side. The latest version of Redis has already supported distributed storage capabilities. Redis cluster is a Redis advanced version that implements a distributed, single point of failure that does not have a central node and has a linear, scalable function. The following figure gives the distributed storage architecture of Redis cluster, in which the nodes communicate with each other through binary protocol, and the node communicates with the client through the ASCII protocol. On the placement strategy of the data, Redis cluster divides the value field of the entire key into 4,096 hash slots, and one or more hash slots can be stored on each node, meaning that the maximum number of nodes supported by the current Redis cluster is 4096. The distributed algorithm used by Redis cluster is also simple: CRC16 (key)% Hash_slots_number.

To ensure data availability under a single point of failure, Redis cluster introduces the master node and the slave node. In Redis cluster, each master node will have two corresponding slave nodes for redundancy. In this way, the downtime of any two nodes in the entire cluster will not cause data to be unavailable. When the master node exits, the cluster automatically selects a slave node to become the new master node.

Resources:
http://www.redisdoc.com/en/latest/
http://memcached.org/

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.