When it comes to Redis, it's reminiscent of memcached and vice versa. The students who have known both have a general impression: Redis is more than just supporting simple key-value data types, but it also provides storage for data structures such as list,set,zset,hash, as compared to memcached, and a backup of Redis support That is, data backup of Master-slave mode, Redis support data persistence, can keep the in-memory data on disk, restart can be loaded again for use, and so on, it seems that Redis is more than memcached, so it is actually not so? The existence is reasonable, we have to compare with several different points.
Network IO Model
Memcached is a multi-threaded, non-blocking IO multiplexing network model, divided into the main thread and the worker sub-thread, listening thread listening network connection, after accepting the request, the connection description Word pipe to the worker thread, read and write Io, The network layer uses Libevent encapsulated event Library, multithreaded model can play multi-core role, but introduced the cache coherency and lock problem, such as: memcached most commonly used stats command, the actual memcached all operations to the global variable lock, Technology and other work, resulting in a loss of performance.
? Redis uses a single-threaded IO multiplexing model, which encapsulates a simple Aeevent event processing framework, mainly implementing Epoll, Kqueue and select, and for Tankun only IO operations, a single thread can maximize the speed advantage But Redis also provides some simple computational functions, such as sorting, aggregation, etc., for these operations, the single-threaded model imposes a significant impact on overall throughput, and during CPU computation, the entire IO schedule is blocked.
Types of data support
? memcached stores and accesses data in key-value form, maintaining a huge hashtable in memory, reducing the time complexity of data queries to O (1), guaranteeing high-performance access to data.
? As the opening note: Redis compared to memcached supports only simple key-value data types, but also provides storage of data structures such as List,set,zset,hash, and can be read in detail on Redis memory usage optimization and storage
Memory management mechanism
? For memory-based database systems such as Redis and memcached, the efficiency of memory management is a key factor affecting system performance. The Malloc/free function in traditional C language is the most commonly used method of allocating and releasing memory, but there is a big flaw in this approach: first, the mismatch of malloc and free for developers is prone to memory leaks, and the second frequent calls can cause large amounts of memory fragmentation to be recycled. Reduce memory utilization, and last as a system call, its system overhead is much larger than the general function call. Therefore, in order to improve the management efficiency of memory, the efficient memory management scheme will not use the Malloc/free call directly. Both Redis and memcached use their own design of memory management mechanism, but the implementation method is very different, the following will be the memory management mechanism of the two are described separately.
? Memcached uses the slab allocation mechanism to manage memory by default, and the main idea is to divide the allocated memory into blocks of specific lengths to store the corresponding length of key-value data records in a predetermined size to completely resolve the memory fragmentation problem. The Slab allocation mechanism is designed to store external data only, which means that all key-value data is stored in the Slab allocation system, while other memcached memory requests are applied by ordinary malloc/free. Because the number and frequency of these requests determines that they do not affect the performance of the entire system slab allocation principle is quite simple. , it first requests a large chunk of memory from the operating system and divides it into blocks of various sizes chunk, and divides the same size blocks into group slab Class. Where chunk is the smallest unit used to store key-value data. The size of each slab class can be controlled by making growth factor when the memcached is started. Assuming that the value of growth factor in the figure is 1.25, if the size of the first set of chunk is 88 bytes, the second group chunk is 112 bytes in size, and so on.
When memcached receives the data sent by the client, it first chooses the most appropriate slab Class based on the size of the data received, and then memcached the slab that is saved by querying the A list of idle chunk in class allows you to find a chunk that can be used to store data. When a database expires or is discarded, the chunk that the record occupies can be reclaimed and re-added to the free list. From the above process, we can see that memcached's memory management system is efficient and does not cause memory fragmentation, but its biggest drawback is that it can lead to wasted space. Because each chunk allocates a specific length of memory space, variable-length data cannot take full advantage of these spaces. , the remaining 28 bytes are wasted by caching 100 bytes of data into a 128-byte chunk.
? Redis memory management mainly through the source code in the zmalloc.h and zmalloc.c two files to achieve. To facilitate memory management, Redis allocates a chunk of memory to the head of the memory block. A real_ptr is a pointer that is returned after Redis calls malloc. Redis takes the size of a block of memory into the head, size occupies a known amount of memory, is the length of the size_t type, and then returns RET_PTR. When the memory needs to be freed, the ret_ptr is passed to the memory management program. With Ret_ptr, the program can easily calculate the value of real_ptr and then pass real_ptr to free to release memory.
? Redis records all memory allocations by defining an array whose length is zmalloc_max_alloc_stat. Each element of the array represents the number of memory blocks allocated by the current program, and the size of the memory block is the subscript for that element. In the source code, this array is zmalloc_allocations. ZMALLOC_ALLOCATIONS[16] represents the number of memory blocks that have been allocated for a length of 16bytes. ZMALLOC.C has a static variable used_memory used to record the total amount of memory currently allocated. So, on the whole, Redis uses packaging mallc/free, which is much simpler than Memcached's memory management approach.
? in Redis, not all data is stored in memory all the time. This is one of the biggest differences compared to memcached. When physical memory is exhausted, Redis can swap some long-unused value to disk. Redis will only cache all key information, and if Redis finds that memory usage exceeds a certain threshold, it will trigger swap operations, Redis according to "swappability = Age*log (size_in_memory)" Calculates which key corresponds to the value that requires swap to disk. The value corresponding to these keys is then persisted to disk and purged in memory. This feature allows Redis to maintain data that is larger than the memory size of its machine itself. Of course, the memory of the machine itself must be able to maintain all the keys, after all, the data will not be swap operations. Also, since Redis swaps the in-memory data to disk, the main thread that provides the service and the sub-thread that is doing the swap will share this memory, so if you update the data that needs swap, REDIS will block the operation until the sub-thread completes the swap operation before it can be modified. When reading data from Redis, if the value of the key being read is not in memory, then Redis needs to load the data from the swap file before returning it to the requester. There is a problem with the I/O thread pool. By default, Redis will be blocked, that is, all swap files will be loaded before the corresponding. This strategy has a small number of clients and is appropriate for batch operations. However, if you apply Redis to a large web site application, this is obviously not sufficient for large concurrency scenarios. So Redis runs we set the size of the I/O thread pool, and concurrently operates on read requests that need to load the corresponding data from the swap file, reducing blocking time.
? Memcached uses a pre-allocated pool of memory to manage memory using slab and chunk of different sizes, item selects the appropriate chunk storage based on size, the way memory pools can save the cost of requesting/freeing memory, and can reduce memory fragmentation, But this approach also leads to a certain amount of wasted space, and when memory is still large, new data may be rejected for reference to Timyang's article: http://timyang.net/data/Memcached-lru-evictions/
? Redis uses on-site memory storage to store data, and rarely uses free-list to optimize memory allocation, and there is a degree of memory fragmentation, and the Redis data store command parameters, which store the time-to-date information separately, and call them temporary data. Non-temporary data is never removed, even if there is not enough physical memory, so that swap will not eliminate any non-temporal data (but will attempt to eliminate some temporary data), which is more appropriate for Redis as storage instead of the cache.
Data storage and persistence
? memcached does not support persisted operations of memory data, all data is stored in in-memory form.
? Redis supports persistent operations. Redis provides two different persistence methods in which data is stored on the hard disk, one is a snapshot (snapshotting), which can write all the data that exists at a moment to the hard disk. Another method is to append only the file (append-only files, AOF), which will be executed when the write command is executed, the write command is copied to the hard disk.
Data consistency issues
? Memcached provides a CAS command that guarantees consistency of the same data for multiple concurrent access operations. Redis does not provide CAS commands, and this is not guaranteed, but Redis provides the functionality of a transaction that guarantees the atomicity of a sequence of commands and is not interrupted by any action.
Cluster Management is different
? Memcached is a full-memory data buffering system, although Redis supports data persistence, but full memory is the essence of its high performance. As a memory-based storage system, the size of the machine's physical memory is the maximum amount of data the system can hold. If the amount of data that needs to be processed exceeds the physical memory size of a single machine, it is necessary to build a distributed cluster to extend the storage capacity.
? Memcached itself does not support distribution, so it is possible to implement memcached distributed storage only at the client through a distributed algorithm such as a consistent hash. A distributed storage implementation architecture for Memcached is presented. Before the client sends data to the memcached cluster, the target node of the data is computed first through the built-in distributed algorithm, and then the data is sent directly to the node for storage. However, when the client queries the data, it also calculates the node where the query data resides, and then sends a query request directly to the node to get the data.
? Compared to memcached can only implement distributed storage with clients, Redis prefers to build distributed storage on the server side. The latest version of Redis already supports distributed storage capabilities. Redis cluster is an advanced version of Redis that implements distributed and allows single points of failure, with no central node and linear scalability. The distributed storage architecture of Redis cluster is given, in which the nodes communicate with the nodes through the binary protocol, and the communication between the node and the client is through the ASCII protocol. On the data placement strategy, Redis cluster divides the numeric field of the entire key into 4,096 hash slots, each of which can store one or more hash slots, which means that the maximum number of nodes currently supported by Redis cluster is 4096. The distributed algorithm used by Redis cluster is also simple: CRC16 (key)% Hash_slots_number.
? To ensure data availability under a single point of failure, Redis cluster introduces master and slave nodes. In Redis cluster, each master node will have two corresponding slave nodes for redundancy. This way, any two-node outage in the entire cluster does not result in data unavailability. When the master node exits, the cluster automatically selects a slave node to become the new master node.
Resources
1. The difference between Redis and memcached
2. Why use Redis and its product positioning
3. Redis memory usage optimization and storage
4. "Redis in Action" Josiah L. Carlson.
The difference between Redis and memcached