Difference between redis and memcached

Source: Internet
Author: User
Tags redis version redis cluster

Difference between redis and memcached
GuideBoth Redis and Memcache are memory-based data storage systems. Memcached is a high-performance distributed memory cache service; Redis is an open-source key-value storage system. Similar to Memcached, Redis stores most of the data in the memory. supported data types include string, hash table, linked list, and other data-related operations. Let's take a look at the differences between redis and memcached.Authoritative comparison

Salvatore Sanfilippo, author of Redis, once compared the two memory-based data storage systems:

  1. Redis supports server-side data operations: Compared with Memcached, Redis has more data structures and supports richer data operations, usually in Memcached, you need to get the data to the client for similar changes and then set back. This greatly increases the number of network I/O operations and data volumes. In Redis, these complex operations are generally as efficient as general GET/SET operations. Therefore, Redis is a good choice if the cache is required to support more complex structures and operations.
  2. Memory usage efficiency comparison: if simple key-value storage is used, Memcached memory usage is higher. If Redis uses the hash structure for key-value storage, due to its combined compression, the memory usage is higher than that of Memcached.
  3. Performance Comparison: Because Redis only uses a single core and Memcached can use multiple cores, Redis has a higher performance than Memcached in storing small data on average. Memcached has higher performance than Redis in data of more than kb. Although Redis has recently optimized its performance in storing big data, it is inferior to Memcached.

The specific reason for the above conclusion is as follows:

1. Different data types are supported

Unlike Memcached, which only supports simple key-value data records, Redis supports a much richer data type. The most common data types are String, Hash, List, Set, and Sorted Set. Redis internally uses a redisObject object to represent all keys and values. Primary redisObject information:

Type indicates the specific data type of a value object. encoding is the storage method of different data types in redis. For example, type = string indicates that value is a common string, then the corresponding encoding can be raw or int. If it is int, it indicates that the actual redis instance stores and represents the string by numerical type. Of course, the premise is that the string itself can be represented by numerical values, for example, "123" "456. Only when the Redis virtual memory function is enabled will the vm field actually allocate memory. This function is disabled by default.

1) String

Common commands: set/get/decr/incr/mget;

Application Scenario: String is the most common data type. Common key/value storage can be classified as this type;

Implementation Method: String is a String by default stored in redis and is referenced by redisObject. When an incr, decr, or other operation is performed, it is converted to a numeric type for calculation. In this case, the encoding field of redisObject is int.

2) Hash

Common commands: hget/hset/hgetall

Application Scenario: We want to store a user information object data, including the user ID, user name, age, and birthday. Through the user ID, we want to obtain the user's name, age, or birthday;

Implementation Method: Redis's Hash is actually an internally stored Value as a HashMap, and provides an interface for directly accessing this Map member ., Key is the user ID and value is a Map. The key of the Map is the attribute name of the member, and value is the attribute value. In this way, you can directly modify and access data through the internal Map Key (the internal Map key in Redis is called field), that is, through the key (User ID) + field (attribute tag) can be used to operate the corresponding attribute data. Currently, there are two ways to implement HashMap: when there are few HashMap members, Redis uses a one-dimensional array to save memory, instead of a real HashMap structure, in this case, the encoding of the redisObject corresponding to the value is zipmap. When the number of members increases, the redisObject is automatically converted to a real HashMap. In this case, the encoding is ht.

3) List

Common commands: lpush/rpush/lpop/rpop/lrange;

Application Scenario: Redis list has many application scenarios and is also one of the most important data structures of Redis. For example, the twitter follow list and fans list can all be implemented using the Redis list structure;

Implementation Method: The implementation of Redis list is a two-way linked list, that is, it supports reverse search and traversal, which is more convenient to operate, but it brings about some additional memory overhead, many internal implementations of Redis, this data structure is also used, including the sending Buffer Queue.

4) Set

Common commands: sadd/spop/smembers/sunion;

Application Scenario: The functions provided by Redis set are similar to those provided by list. The special feature is that set can be automatically rescheduled. When you need to store a list data, if you do not want duplicate data, set is a good choice, and set provides an important interface to determine whether a member is in a set. This is also not provided by list;

Implementation Method: the internal implementation of the set is a HashMap whose value is always null. Actually, the hash calculation method is used to quickly remove duplicates, this is also why set can determine whether a member is in the set.

5) Sorted Set

Common commands: zadd/zrange/zrem/zcard;

Application Scenario: The Application Scenario of Redis sorted set is similar to that of set. The difference is that set is not automatically ordered, while sorted set can provide an additional priority (score) by users) to sort the members, and insert them in order, that is, automatic sorting. When you need an ordered and non-repeated list of sets, you can choose sorted set data structure. For example, twitter's public timeline can be stored as score by posting time, in this way, the query is automatically sorted by time.

Implementation Method: Redis sorted set uses HashMap and SkipList internally to ensure data storage and order. In HashMap, Members are mapped to scores, the hop table stores all the members, and the sorting is based on the score saved in HashMap. using the structure of the hop table, you can get a high search efficiency and the implementation is relatively simple.

2. Different memory management mechanisms

In Redis, not all data is stored in the memory. This is the biggest difference from Memcached. When the physical memory is used up, Redis can swap some value that has not been used up for a long time to the disk. Redis only caches information about all keys. If Redis finds that the memory usage exceeds a threshold value, it triggers the swap operation. According to "swappability = age * log (size_in_memory) calculate the key value that needs to be swap to the disk. Then, the values corresponding to these keys are persisted to the disk and cleared in the memory. This feature allows Redis to maintain data that exceeds the memory size of its machine. Of course, the memory of the machine must be able to maintain all keys. After all, the data will not be operated by swap. At the same time, because Redis will provide the Service Main Line and the sub-threads that perform the swap operation will share this part of the memory when swap data is sent to the disk, if the swap data is updated, redis will block this operation until the subthread completes the swap operation. When reading data from Redis, if the value corresponding to the read key is not in memory, Redis needs to load the corresponding data from the swap file and then return it to the requester. Here there is an I/O thread pool problem. By default, Redis will be congested, that is, after all swap files are loaded, it will respond accordingly. This policy is suitable for batch operations when the number of clients is small. However, if Redis is applied to a large website application, this obviously cannot meet the needs of high concurrency. Therefore, when Redis runs, we set the size of the I/O thread pool, and perform concurrent operations on the read requests that need to load the corresponding data from the swap file to reduce the blocking time.

For memory-based database systems such as Redis and Memcached, the efficiency of memory management is a key factor affecting system performance. The malloc/free function in the traditional C language is the most common method for allocating and releasing memory. However, this method has many drawbacks: first, for developers, mismatched malloc and free may cause memory leakage. Second, frequent calls may cause a large number of memory fragments to be recycled and reused, reducing memory utilization. Finally, they are called as a system, the system overhead is far greater than that of common function calls. Therefore, to improve memory management efficiency, efficient memory management solutions do not directly use malloc/free calls. Redis and Memcached both use their own memory management mechanisms, but there are great differences in implementation methods. The memory management mechanisms of the two are described below.

Memcached uses the Slab Allocation mechanism to manage the memory by default. Its main idea is to split the allocated memory into blocks of a specific length based on the predefined size to store key-value data records of the corresponding length, to completely solve the memory fragmentation problem. The Slab Allocation mechanism is designed only to store external data. That is to say, all key-value data is stored in the Slab Allocation system, other memory requests of Memcached are applied through common malloc/free, because the number and frequency of these requests determine that they will not affect the performance of the entire system. The Slab Allocation principle is quite simple. First, it applies for a large block of memory from the operating system, splits it into chunks of various sizes, and divides the chunks of the same size into Slab classes. The Chunk is the minimum unit used to store key-value data. The size of each Slab Class can be controlled by setting the Growth Factor when Memcached is started. Assume that the value of Growth Factor in the figure is 1.25. If the size of the first Chunk is 88 bytes, the size of the second Chunk is 112 bytes.

When Memcached receives the data sent from the client, it first selects the most appropriate Slab Class based on the size of the received data, then, you can find a Chunk that can be used to store data by querying the list of idle chunks in the Slab Class saved by Memcached. When a database expires or is discarded, the chunks occupied by the record can be recycled and added to the idle list again.

From the above process, we can see that Memcached's memory management system is highly efficient and will not cause memory fragmentation, but its biggest drawback is that it will lead to a waste of space. Because each Chunk is allocated with memory space of a specific length, the extended data cannot make full use of the space. As shown in, 100 bytes of data are cached to 128 bytes of Chunk, and the remaining 28 bytes are wasted.

Redis memory management is mainly implemented through the zmalloc. h and zmalloc. c files in the source code. To facilitate memory management, Redis stores the memory size in the header of the memory block after allocating a piece of memory ., Real_ptr is the pointer returned by redis after calling malloc. Redis stores the size and size of the memory block into the header. The size occupies a known memory size, which is a length of the size_t type and returns ret_ptr. When the memory needs to be released, ret_ptr is passed to the memory management program. Through ret_ptr, the program can easily calculate the value of real_ptr, and then pass real_ptr to free to release the memory.

Redis records all memory allocations by defining an array. The length of this array is ZMALLOC_MAX_ALLOC_STAT. Each element of the array represents the number of memory blocks allocated by the current program, and the size of the memory block is the subscript of the element. In the source code, this array is zmalloc_allocations. Zmalloc_allocations [16] indicates the number of allocated 16-bytes memory blocks. The static variable used_memory in zmalloc. c is used to record the total size of the allocated memory. So, in general, Redis uses mallc/free packaging, which is much simpler than Memcached's memory management method.

3. Data Persistence support

Although Redis is a memory-based storage system, it supports the persistence of memory data and provides two main persistence policies: RDB snapshot and AOF log. Memcached does not support data persistence.

1) RDB Snapshot

Redis supports the persistence mechanism of saving the snapshot of the current data into a data file, that is, the RDB snapshot. But how does one generate a snapshot for a database that is continuously written? Redis uses the copy on write mechanism of the fork command. When a snapshot is generated, the current process is fork out of a sub-process, and then all the data is recycled in the sub-process to write the data into an RDB file. We can use the save command of Redis to configure the time when RDB snapshots are generated. For example, if you configure to generate snapshots within 10 minutes, you can also configure to generate snapshots after 1000 writes, you can also implement multiple rules together. These rules are defined in the Redis configuration file. You can also use the Redis config set command to SET rules during Redis running without restarting Redis.

The RDB file of Redis will not be broken because the write operation is performed in a new process. When a new RDB file is generated, the child process generated by Redis first writes the data to a temporary file, and then renames the temporary file to the RDB file through an atomic rename system call. This causes a fault at any time, redis RDB files are always available. At the same time, the Redis RDB file is also a part of the internal implementation of Redis master-slave synchronization. RDB has its own shortcomings, that is, once a database problem occurs, the data stored in our RDB file is not completely new, data from the last RDB file generation to Redis downtime is all lost. In some businesses, this is tolerable.

2) AOF log

The full name of AOF logs is append only file, which is an append log file. Unlike the binlog of a general database, the AOF file is identifiable plain text, and its content is a standard Redis command. Only commands that may cause data modification will be appended to the AOF file. Every command to modify data generates a log, and the AOF file will become larger and larger. Therefore, Redis provides another function called AOF rewrite. Its function is to regenerate an AOF file. The operation of a record in the new AOF file will only be performed once, unlike an old file, operations on the same value may be recorded multiple times. The generation process is similar to that of RDB. It is also a fork process that directly traverses data and writes data to a new AOF temporary file. During the process of writing new files, all write operation logs will still be written to the old AOF file and recorded in the memory buffer. After the operation is completed, all logs in the buffer zone are written to the temporary file at one time. Then, call the atomic rename command to replace the old AOF file with the new AOF file.

AOF is a file write operation. It aims to write operation logs to disks, so it will also encounter the write operation process we mentioned above. After calling write on AOF in Redis, The appendfsync option is used to control the time when fsync is called to write data to the disk. The following three appendfsync settings gradually increase the security intensity.

  • Appendfsync no when appendfsync is set to no, Redis will not take the initiative to call fsync to synchronize the AOF log content to the disk, so all this depends on the operating system debugging. For most Linux operating systems, fsync is performed every 30 seconds to write data in the buffer zone to the disk.
  • Appendfsync everysec when appendfsync is set to everysec, Redis performs an fsync call every second by default, and writes data in the buffer zone to the disk. However, when the fsync call lasts for more than 1 second. Redis will adopt the fsync delay policy and wait a second. That is, perform fsync two seconds later. This fsync will be executed no matter how long it will take. At this time, because the file descriptor will be blocked during fsync, the current write operation will be blocked. The conclusion is that in most cases, Redis performs fsync every second. In the worst case, an fsync operation is performed in two seconds. This operation is called group commit in most database systems. It combines the data of multiple write operations and writes logs to the disk at one time.
  • Appednfsync always: When appendfsync is set to always, fsync is called for every write operation. Data is the safest. Of course, because fsync is executed every time, therefore, its performance will also be affected.

For general business requirements, we recommend that you use RDB for persistence because RDB overhead is much lower than AOF logs. For applications that cannot bear data loss, we recommend that you use AOF logs.

4. Differences in cluster management

Memcached is a full-memory data buffer system. Although Redis supports data persistence, the full memory is the essence of its high performance. As a memory-based storage system, the physical memory size of the machine is the maximum data size that the system can accommodate. If the amount of data to be processed exceeds the physical memory size of a single machine, you need to build a distributed cluster to expand the storage capacity.

Memcached itself does not support distributed storage. Therefore, you can only use distributed algorithms such as consistent hash on the client to implement distributed storage of Memcached. The distributed storage implementation architecture of Memcached is provided. Before the client sends data to the Memcached cluster, the target node of the data is calculated using the built-in distributed algorithm, and the data is directly sent to the node for storage. However, when the client queries data, it also needs to calculate the node where the query data is located, and then directly sends a query request to the node to obtain the data.

Compared with Memcached, apsaradb for Redis only uses clients for distributed storage. Redis prefers to build distributed storage on the server. The latest version of Redis supports distributed storage. Redis Cluster is an advanced Redis version that implements distributed and allows single point of failure (spof). It has no central node and features linear scalability. The Distributed Storage Architecture of Redis Cluster is provided. The nodes communicate with each other through the binary protocol, and the nodes communicate with each other through the ascii protocol. In terms of data placement policies, Redis Cluster divides the entire key numeric field into 4096 hash slots, and each node can store one or more hash slots, that is to say, the maximum number of nodes currently supported by Redis Cluster is 4096. The distributed algorithm used by Redis Cluster is also simple: crc16 (key) % HASH_SLOTS_NUMBER.

To ensure data availability under spof, Redis Cluster introduces Master nodes and Slave nodes. In Redis Cluster, each Master node has two Slave nodes for redundancy. In this way, the downtime of any two nodes in the entire cluster will not cause data unavailability. When the Master node exits, the cluster automatically selects a Server Load balancer node to become the new Master node.

References:

  • Http://www.redisdoc.com/en/latest/
  • Http://memcached.org/

From: http://h2ex.com/1223

Address: http://www.linuxprobe.com/redisVSmemcached.html


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.