The Salvatore Sanfilippo, the author of Redis, has previously compared these two memory-based data storage systems:
- Redis supports server-side data operations: Redis has more data structures and supports richer data operations than memcached, usually in memcached, you need to get the data to the client for similar modifications and set it back. This greatly increases the number of network IO and the volume of data. In Redis, these complex operations are often as efficient as the general get/set. Therefore, Redis is a good choice if you need caching to support more complex structures and operations.
- Memory usage Efficiency comparison: With simple key-value storage, memcached memory utilization is higher, and if Redis uses hash structure to do key-value storage, its memory utilization will be higher than memcached because of its combined compression.
- Performance comparison: Since Redis uses only single cores, and memcached can use multicore, Redis on average has a higher performance than memcached for storing small data on each core. In more than 100k of data, memcached performance is higher than Redis, although Redis has recently been optimized for the performance of storing big data, but it is slightly inferior to memcached.
Specifically why the above conclusions appear, the following is the information collected:
1. Data types support different
Unlike data records that memcached only support simple key-value structures, Redis supports a much richer data type. The most commonly used data types are five kinds: String, Hash, List, set, and sorted set. Within Redis, a Redisobject object is used to represent all keys and value. Redisobject most Important information:
The type represents what data type a value object is, and encoding is how different data types are stored inside the Redis, such as: Type=string represents a normal string for value. Then the corresponding encoding can be raw or int, and if it is an int, it means that the actual redis is stored and represented by a numeric class, assuming that the string itself can be represented numerically, such as a string such as "123″" 456 ". The VM Field field does not actually allocate memory until the Redis virtual Memory feature is turned on, which is turned off by default.
1) String
- Common commands: Set/get/decr/incr/mget, etc.;
- Application Scenario: String is the most commonly used data type, and ordinary key/value storage can be classified as such;
- Implementation: string in the Redis internal storage By default is a string, referenced by Redisobject, when encountered Incr, DECR and other operations will be converted to numeric type for calculation, at this time Redisobject encoding field is int.
2) Hash
- Common commands: Hget/hset/hgetall, etc.
- Application scenario: We want to store a user information object data, including user ID, user name, age and birthday, through the user ID we want to obtain the user's name or age or birthday;
- Implementation: A Redis hash is actually a hashmap that is stored internally and provides an interface for direct access to the map member. , key is the user ID and value is a map. The key of this map is the attribute name of the member, and value is the property value. The data can be modified and accessed directly through its internal map key (Redis called internal map key field), that is, through key (user ID) + field (attribute tag) can manipulate the corresponding attribute data. There are two ways to implement the current HashMap: When HashMap's members are relatively small, Redis saves memory in a way that is similar to a one-dimensional array, rather than using a real hashmap structure. At this point the corresponding value of the Redisobject encoding is Zipmap, when the number of members increases will automatically turn into a real hashmap, at this time encoding for HT.
3) List
- Common commands: Lpush/rpush/lpop/rpop/lrange, etc.;
- Application Scenario: Redis list has a lot of applications and is one of the most important data structures of redis, such as Twitter watchlist, fan list, etc. can be implemented using Redis's list structure;
- Implementation: the implementation of the Redis list is a doubly linked list, which can support reverse lookup and traversal, more convenient operation, but with some additional memory overhead, many implementations within Redis, including sending buffer queues, are also used in this data structure.
4) Set
- Common commands: sadd/spop/smembers/sunion, etc.;
- Scenario: Redis set provides functionality that is similar to a list, except that set is automatic, and when you need to store a list of data and you don't want duplicate data, set is a good choice. and set provides an important interface to determine whether a member is within a set set, which is not available in list;
- Implementation: The internal implementation of set is a value that is always null hashmap, is actually calculated by calculating the hash of the way to fast weight, which is also set to provide a judge whether a member is within the set.
5) Sorted Set
- Common commands: Zadd/zrange/zrem/zcard, etc.;
- Scenario: The usage scenario for Redis sorted set is similar to set, except that set is not automatically ordered, and sorted set can sort members by providing an additional priority (score) parameter for the user, and is inserted in an ordered, automatic sort order. When you need an ordered and non-repeating collection list, you can choose sorted set data structures, such as the public Timeline of Twitter, which can be stored as score in the publication time, which is automatically sorted by time.
- Implementation: The internal use of Redis sorted set HashMap and Jump Table (skiplist) to ensure data storage and order, HashMap in the member to score mapping, and the jumping table is stored in all the members, Sorting is based on the score of HashMap, using the structure of the jumping table can obtain a relatively high efficiency, and the implementation is relatively simple.
2. Different memory management mechanism
In Redis, not all data is stored in memory all the time. This is one of the biggest differences compared to memcached. When physical memory is exhausted, Redis can swap some long-unused value to disk. Redis will only cache all key information, and if Redis finds that memory usage exceeds a certain threshold, it will trigger swap operations, Redis according to "swappability = Age*log (size_in_memory)" Calculates which key corresponds to the value that requires swap to disk. The value corresponding to these keys is then persisted to disk and purged in memory. This feature allows Redis to maintain data that is larger than the memory size of its machine itself. Of course, the memory of the machine itself must be able to maintain all the keys, after all, the data will not be swap operations. Also, since Redis swaps the in-memory data to disk, the main thread that provides the service and the sub-thread that is doing the swap will share this memory, so if you update the data that needs swap, REDIS will block the operation until the sub-thread completes the swap operation before it can be modified. When reading data from Redis, if the value of the key being read is not in memory, then Redis needs to load the data from the swap file before returning it to the requester. There is a problem with the I/O thread pool. By default, Redis will be blocked, that is, all swap files will be loaded before the corresponding. This strategy has a small number of clients and is appropriate for batch operations. However, if you apply Redis to a large web site application, this is obviously not sufficient for large concurrency scenarios. So Redis runs we set the size of the I/O thread pool, and concurrently operates on read requests that need to load the corresponding data from the swap file, reducing blocking time.
For memory-based database systems such as Redis and memcached, the efficiency of memory management is the key factor affecting the performance of the system. The Malloc/free function in traditional C language is the most commonly used method of allocating and releasing memory, but there is a big flaw in this approach: first, the mismatch of malloc and free for developers is prone to memory leaks, and the second frequent calls can cause large amounts of memory fragmentation to be recycled. Reduce memory utilization, and last as a system call, its system overhead is much larger than the general function call. Therefore, in order to improve the management efficiency of memory, the efficient memory management scheme will not use the Malloc/free call directly. Both Redis and memcached use their own design of memory management mechanism, but the implementation method is very different, the following will be the memory management mechanism of the two are described separately.
Memcached uses the slab allocation mechanism to manage memory by default, and the main idea is to divide the allocated memory into blocks of specific lengths to store the corresponding length of key-value data records in a predetermined size to completely resolve the memory fragmentation problem. The Slab allocation mechanism is designed to store external data only, which means that all key-value data is stored in the Slab allocation system, while other memcached memory requests are applied by ordinary malloc/free. Because the number and frequency of these requests determines that they do not affect the performance of the entire system slab allocation principle is quite simple. , it first requests a large chunk of memory from the operating system and divides it into blocks of various sizes chunk, and divides the same size blocks into group slab Class. Where chunk is the smallest unit used to store key-value data. The size of each slab class can be controlled by making growth factor when the memcached is started. Assuming that the value of growth factor in the figure is 1.25, if the size of the first set of chunk is 88 bytes, the second group chunk is 112 bytes in size, and so on.
When memcached receives the data sent by the client, it will first select the most appropriate slab Class based on the size of the data received, and then memcached save the slab by querying the A list of idle chunk in class allows you to find a chunk that can be used to store data. When a database expires or is discarded, the chunk that the record occupies can be reclaimed and re-added to the free list. From the above process, we can see that memcached's memory management system is efficient and does not cause memory fragmentation, but its biggest drawback is that it can lead to wasted space. Because each chunk allocates a specific length of memory space, variable-length data cannot take full advantage of these spaces. , the remaining 28 bytes are wasted by caching 100 bytes of data into a 128-byte chunk.
Redis memory management mainly through the source code in the zmalloc.h and zmalloc.c two files to achieve. To facilitate memory management, Redis allocates a chunk of memory to the head of the memory block. A real_ptr is a pointer that is returned after Redis calls malloc. Redis takes the size of a block of memory into the head, size occupies a known amount of memory, is the length of the size_t type, and then returns RET_PTR. When the memory needs to be freed, the ret_ptr is passed to the memory management program. With Ret_ptr, the program can easily calculate the value of real_ptr and then pass real_ptr to free to release memory.
Redis records all memory allocations by defining an array whose length is zmalloc_max_alloc_stat. Each element of the array represents the number of memory blocks allocated by the current program, and the size of the memory block is the subscript for that element. In the source code, this array is zmalloc_allocations. ZMALLOC_ALLOCATIONS[16] represents the number of memory blocks that have been allocated for a length of 16bytes. ZMALLOC.C has a static variable used_memory used to record the total amount of memory currently allocated. So, on the whole, Redis uses packaging mallc/free, which is much simpler than Memcached's memory management approach.
3. Data Persistence support
Although Redis is a memory-based storage system, it natively supports the persistence of memory data and provides two main persistence strategies: RDB Snapshots and aof logs. The memcached does not support data persistence operations.
1) Rdb Snapshot
Redis supports the persistence of saving snapshots of current data into a data file, an RDB snapshot. But how does a continuously written database generate a snapshot? Redis uses the copy on write mechanism of the fork command. When a snapshot is generated, the current process is forked out of a child process, and then all data is looped through the child process, and the data is written to an RDB file. We can configure the timing of the RDB snapshot generation with the Redis save instruction, such as configuring a snapshot for 10 minutes, or configuring 1000 writes to generate a snapshot, or multiple rules to implement. The definitions of these rules are in the Redis configuration file, and you can set the rules at Redis runtime with Redis's config set command, without having to restart Redis.
The Redis Rdb file does not break because its write operation is performed in a new process, and when a new Rdb file is generated, the Redis-generated subprocess writes the data to a temporary file and then renames the temporary file to an Rdb file by means of an atomic rename system call. In this way, Redis's RDB files are always available whenever a failure occurs. At the same time, Redis's Rdb file is also a part of the Redis master-slave synchronization implementation. The RDB has his shortcomings, that is, once the database has a problem, then the data stored in our Rdb file is not entirely new, the data from the last Rdb file generation to the Redis outage was lost. In some businesses, this is tolerable.
2) aof Log
The full name of the AOF log is append only file, which is an append-write log file. Unlike the binlog of a general database, the AoF file is a plain, recognizable text, and its content is a Redis standard command. Only those commands that cause data changes will be appended to the AoF file. Each command that modifies the data generates a log, and the aof file becomes larger, so Redis provides a feature called AoF rewrite. Its function is to regenerate a copy of the AoF file, one record in the new AoF file is only once, and unlike an old file, multiple operations on the same value may be logged. Its build process is similar to an RDB, and it also fork a process, traversing the data directly, and writing a new aof temporary file. In the process of writing a new file, all of the write logs are still written to the old aof file and are also recorded in the memory buffer. When the completion of the operation completes, logs from all buffers are written to the temporary file once. Then call the atomic Rename command to replace the old aof file with the new AoF file.
AoF is a write file operation whose purpose is to write the operation log to disk, so it will also encounter the process of writing that we have described above. After calling write write to aof in Redis, the Appendfsync option controls when the call Fsync writes it to disk, and the three settings below Appendfsync, the security intensity becomes stronger.
- Appendfsync No when the Appendfsync is set to No, Redis does not actively call Fsync to synchronize the AOF log content to disk, so it is entirely dependent on the operating system's debugging. For most Linux operating systems, Fsync is performed every 30 seconds, and the data in the buffer is written to disk.
- Appendfsync Everysec When the Appendfsync is set to Everysec, Redis will default to a Fsync call every second, writing the data in the buffer to disk. However, this time when the Fsync call is longer than 1 seconds. Redis takes a deferred fsync policy and waits another second. That is, in two seconds after the Fsync, this time Fsync no matter how long it will be carried out. Because the file descriptor is blocked at Fsync, the current write operation is blocked. So the conclusion is that in most cases, Redis will be fsync every second. In the worst case, a fsync operation is performed in two seconds. This operation is called Group commit in most database systems, which is the combination of multiple write operations and writes the logs to disk at once.
- Appednfsync always when the Appendfsync is set, each write operation will be called once Fsync, when the data is the safest, of course, because each time the Fsync will be executed, so its performance will be affected.
For general business requirements, it is recommended to use an RDB for persistence because the cost of the RDB is much lower than the AOF log, and it is recommended to use the AOF log for applications that cannot tolerate data loss.
4, the different cluster management
Memcached is a full-memory data buffering system, although Redis supports data persistence, but full memory is the essence of its high performance. As a memory-based storage system, the size of the machine's physical memory is the maximum amount of data the system can hold. If the amount of data that needs to be processed exceeds the physical memory size of a single machine, it is necessary to build a distributed cluster to extend the storage capacity.
Memcached itself does not support distribution, so it is possible to implement memcached distributed storage only at the client through a distributed algorithm such as a consistent hash. A distributed storage implementation architecture for Memcached is presented. Before the client sends data to the memcached cluster, the target node of the data is computed first through the built-in distributed algorithm, and then the data is sent directly to the node for storage. However, when the client queries the data, it also calculates the node where the query data resides, and then sends a query request directly to the node to get the data.
Redis is more inclined to build distributed storage on the server side than memcached can only implement distributed storage with clients. The latest version of Redis already supports distributed storage capabilities. Redis cluster is an advanced version of Redis that implements distributed and allows single points of failure, with no central node and linear scalability. The distributed storage architecture of Redis cluster is given, in which the nodes communicate with the nodes through the binary protocol, and the communication between the node and the client is through the ASCII protocol. On the data placement strategy, Redis cluster divides the numeric field of the entire key into 4,096 hash slots, each of which can store one or more hash slots, which means that the maximum number of nodes currently supported by Redis cluster is 4096. The distributed algorithm used by Redis cluster is also simple: CRC16 (key)% Hash_slots_number.
To ensure data availability under a single point of failure, Redis cluster introduces master and slave nodes. In Redis cluster, each master node will have two corresponding slave nodes for redundancy. This way, any two-node outage in the entire cluster does not result in data unavailability. When the master node exits, the cluster automatically selects a slave node to become the new master node.
Resources:
- http://www.redisdoc.com/en/latest/
- http://memcached.org/
A detailed explanation of the differences between Redis and Memcached