Comparison of two large database cache system implementations

Source: Internet
Author: User
Tags epoll joins memcached rehash redis server

And Redis, as the most commonly used cache server in recent years, I believe you are familiar with them. The first two years are still in school, I have read their main source code, and now write a note from a personal point of view a simple comparison of their implementation, right as a review, there is a misunderstanding of the place, welcome correction.
  
Comparison of two large database cache system implementations comparison of two large database cache systems. Summary read a software source code, first to understand the software is used for what, what is memcached and Redis do? As we all know, the data is generally placed in the database, but the query data will be relatively slow, especially when the user a lot of frequent queries, it takes a lot of time. What do we do? Where is the data on the query fast? That must be in memory. Memcached and Redis are the data stored in memory, according to Key-value query, can greatly improve efficiency. So generally they are used as cache server, cache common data, need to query, directly from them, reduce the number of query database, improve query efficiency.
  
Two. How does the service approach and Redis serve? They are independent processes that, if needed, can be turned into daemon processes, so that our user processes need to use and Redis's services, it requires interprocess communication. It is also necessary to support inter-network communication, considering that user processes and Redis are not necessarily on the same machine. So, and Redis itself is a Web server, the user process through the network with them to transfer data, obviously the simplest and most commonly used is the use of TCP connection. In addition, the UDP protocol is supported with Redis. And when user processes and Redis are on the same machine, UNIX domain sockets can also be used for communication.
  
Three. The event model starts with the details of how they are implemented. First look at their event models.
  
Since Epoll, almost all Web servers have abandoned select and poll, replacing them with Epoll. Redis is the same, not much. It also provides support for select and poll, which can be configured by itself, but is generally used with epoll. In addition, the use of Kqueue is supported for BSD. Memcached is based on libevent, but Libevent is also used epoll, so you can think of them as using Epoll. Epoll features are not introduced here, the online introduction of many articles.
  
Comparison of two database cache system implementations comparison of two large database cache systems they use Epoll to do the event loop, but Redis is a single-threaded server (Redis is also multithreaded, except that other threads have no event loop other than the main thread). It just does some background storage work), and memcached is multithreaded. The Redis event model is simple and has only one event loop, which is a simple reactor implementation. However, the Redis event model has a bright spot, we know that Epoll is for FD, it returns the Ready event is only Fd,redis inside the FD is the server and the client connection of the socket FD, but when processing, need to find specific client information according to this FD, How to find it? The usual way of processing is to use the red and black tree to save FD and client information, through FD lookup, efficiency is LGN.
  
However, Redis is special, the maximum number of Redis clients can be set, that is, you can know the same time, Redis open the upper limit of FD, and we know that the process of FD at the same time is not duplicated (FD can only be reused after closing), so Redis uses an array, The FD as the subscript of the array, the elements of the array is the client's information, so that directly through the FD can locate the client information, the search efficiency is O (1), but also eliminate the complex red and black tree implementation (I used to write a network server, because to maintain the FD and connect correspondence, Do not want to write their own red and black tree, and then use the STL inside the set, resulting in the project into C + +, the final project using g++ compile, this matter I do not say who know? )。 Obviously this way only for connection number limit has been determined, and is not too large network server, such as Nginx HTTP server is not applicable, Nginx is to write their own red black tree.
  
And memcached is multithreaded, using Master-worker way, the main thread listens to the port, establishes the connection, and then assigns the order to each worker thread. Each slave thread has an event loop that serves a different client. The master thread communicates with the worker thread using the pipe, and each worker thread creates a pipeline, saves the write and read ends, and joins the read end to the event loop to listen for readable events. At the same time, each slave thread has a ready connection queue, and after the main line thread attached is connected, the attached item is placed in the queue, and then a connect command is written to the write end of the thread's pipeline, so that the pipe read in the event loop is ready to read the command from the thread, The parse command discovers that there is a connection, and then it takes the connection to its own ready queue and processes it. The advantage of multithreading is that you can give full play to the advantages of multi-core, but write a program trouble, memcached inside there are various locks and conditional variables for thread synchronization.
  
Four. Memory allocation and the core task of Redis is to manipulate data in memory, and memory management is naturally the core content.
  
First look at how their memory is allocated. Memcached has its own pool of memory, that is, pre-allocating a large chunk of memory, and then allocating memory to allocate from the memory pool, so as to reduce the number of memory allocations, improve efficiency, which is the majority of network server implementation, but the management of the various memory pools according to the specific situation and different. And Redis does not have its own pool of memory, but the direct use of the allocation, that is, when the need to allocate, memory management of the matter to the kernel, its own only responsible for fetching and releasing (Redis is both single-threaded, without its own memory pool, is not the sense of implementation is too simple?) That's because the focus is on the database module. However, Redis supports the use of Tcmalloc to replace the glibc malloc, the former is Google's products, faster than the glibc malloc.
  
Because Redis does not have its own pool of memory, memory application and release management is much simpler, direct malloc and free can be very convenient. And memcached is to support the memory pool, so the memory request is taken from the memory pool, and free is also back to the memory pool, so a lot of additional management operations, the implementation of a lot of trouble, the concrete will be in the memcached of the slab mechanism explained in the later analysis.
  
Five. Database implementation next look at their core content, the implementation of their respective databases.
  
The database implementation supports only Key-value, that is, only one key for a value. Its data is stored in memory in the same way as Key-value, and it uses the slab mechanism.
  
First look at how memcached stores the data, that is, the storage key-value pair. For example, each key-value pair is stored in an item structure, containing the associated attributes and values of key and value.
  
Comparison of two database cache system implementations comparison of two large database cache systems item is to save the key-value pair, when the item is many, how to find the specific item is a problem. So memcached maintains a hash table, which is used to quickly find item. Hash table applies the open chain method (like Redis) to resolve the conflict of keys, each hash table is stored in the bucket of a list, the list node is the pointer to the item, such as the h_next is the bucket inside the list of the next node.
  
Hash table Support Expansion (the number of item is the number of buckets of more than 1.5 when expanding), there is a primary_hashtable, there is a old_hashtable, which is normal for primary_hashtable, but when the expansion, will Old_ Hashtable = primary_hashtable, then primary_hashtable is set to the new Request hash table (the number of buckets multiplied by 2), then the data inside the old_hashtable is moved to the new hash table, With a variable expand_bucket record and the number of buckets moved, after the move is complete, then free the original old_hashtable (Redis also has two hash table, but also mobile, but not the background thread to complete, but each move a bucket).
  
Expansion of the operation, dedicated to a background expansion of the thread to complete, the need to expand the time, the use of conditional variables to notify it, after the completion of the expansion, it also test blocking waiting for the expansion of the condition variables. In this way, when expanding, looking for an item may be in either primary_hashtable or old_hashtable, depending on the position of its bucket and the size of the Expand_bucket to determine which table it is in.
  
Where is the item allocated from? From the slab. For example, Memcached has many slabclass, they manage slab, each slab is actually a collection of trunks, the real item is assigned in the trunk, and a trunk is assigned an item. The size of a trunk in a slab, the size of the different slab,trunk increments, the need to apply for a new item, according to its size to select the trunk, the rule is the smallest trunk larger than it. In this way, the different size of the item is allocated in different slab, the different slabclass management. The disadvantage is that there will be some memory waste, because a trunk may be larger than item, 2, allocate 100B of the item, choose 112 trunk, but there will be a waste of 12B, this part of memory resources are not used.
  
Comparison of two database cache system implementations comparison of two large database cache system Implementation Contrast two large database cache system implementation Comparison of two large database cache system implementation For example, the whole structure is like this, slabclass management slab, A slabclass has a slab_list that can manage multiple slab, the same size as the slab trunk in the same slabclass. Slabclass has a pointer slot that holds the unassigned item that has been free (not really free memory, just no more), and when the item is not used, it is placed in the header of the slot so that each time it needs to allocate item in the current slab, Take the slot directly, regardless of whether the item is unassigned or released.
  
Then, each slabclass corresponds to a linked list, with head and tail arrays, which hold the head and tail nodes of the linked list, respectively. The node in the list is the item assigned by the Slabclass, the newly assigned item placed on the head, and the more the list is back, indicates that it has not been used for a long time. When Slabclass is out of memory and needs to delete some expired item, it can be deleted from the tail of the list, and yes, the list is for LRU. It doesn't work, because the list of queries is O (n), so when you locate the item, use the hash table, which already has, all the assigned item is already in the hash table, so, hash is used to find item, then the list is useful to store the last order of item, This is also the LRU standard implementation method.
  
Each time you need to assign a new item, find the list of Slabclass, look forward from the tail, see if item has expired and expire, and use the expired item as the new item. If there is no expiration, you need to allocate the trunk from the slab, and if slab is exhausted, you need to add slab to slabclass.
  
Support Setting the expiration time, that is, expire times, but the internal does not periodically check whether the data is out of date, but when the customer process uses the data, it will check expire time, if it expires, return the error directly. The advantage of this is that there is no need for an additional CPU to perform the expire time check, and the disadvantage is that it is possible that the expired data is not being used for long, and it has not been released and occupied memory.
  
is multithreaded, and only maintains a single database, so there may be multiple client processes that operate on the same data, which can cause problems. For example, a has changed the data, then B has changed the data, then A's operation is overwritten, and maybe a does not know, a task data in the current state when he changed the value, so that may cause problems. In order to solve this problem, using the CAS protocol, simply said item to save a 64-bit unsigned int value, mark the version of the data, every update (the data value has been modified), the version number is incremented, and then each time the data changes operation, It needs to be the same as the version number of the client process and the version number of the server side item, and it can be changed, otherwise dirty data will be prompted.
  
The above is memcached how to implement a Key-value database introduction.
  
Database implementation First Redis database is a bit more powerful because unlike memcached only supports saving strings, Redis supports string, list, set,sorted set,hash table 5 data structures. For example, to store a person's information can use a hash table, the name of the user to do key, and then the name Super, age 24, through key and name, you can take the name super, or through the key and ages, you can get to 24. In this way, when only need to get the age of the time, do not need to get the entire information of the people, and then from the inside to find out, get age directly, efficient and convenient.
  
In order to implement these data structures, Redis defines an abstract object, Redis objects, such as. Each object has a type, altogether 5 kinds: string, linked list, collection, ordered set, hash table.
  
At the same time, in order to improve efficiency, REDIS prepared a variety of implementations for each type, according to the specific scenario to choose the appropriate implementation, encoding is to represent the way the object is implemented. Then there is an LRU that records the object, that is, the last time it was accessed, and a current time is recorded in the Redis server (approximate, because the time is only updated at a certain time and the server is automatically maintained), and they can calculate how long the object has not been accessed for two.
  
Then there is the reference count in Redis object, which is used to share the object and then determine when the object was deleted. Finally, a void* pointer is used to point to the object's true content. Formally because of the use of abstract Redis object, making the database operational data is much more convenient, all unified use of Redis object objects, you need to distinguish between object types, and then judge by type. and formally due to the use of this object-oriented approach, so that Redis code looks like C + + code, in fact, all written in C.
  
A string type linked list type collection type (unordered), can be differential set, and sets the type of the hash type of an ordered collection type, including the bottom in order to save space, a type of data,//Can be Reference count in different storage mode in the end, Redis is still a key-value database, no matter how many data structures it supports, ultimately stored in key-value way, but value can be a list, set,sorted Set,hash Table and so on. Like memcached, all keys are string, while set,sorted Set,hash table is used for specific storage such as String. and C does not have a ready-made string, so the first task of Redis is to implement a string called SDS (Simple dynamic string), such as the following code, a very straightforward structure, Len stores the total memory length of the string, Free indicates how many bytes are unused, and BUF stores specific data, apparently len-free is the length of the current string.
  
The string is resolved, all keys are saved as SDS, so how does the key and value relate? Key-value format in the scripting language is very good processing, direct use of the dictionary, c No dictionary, how to do? Write a chant (Redis is very keen on making wheels). Look at the following code, Privdata save extra information, with very little, at least we found. DICTHT is a specific hash table, a dict corresponds to two hash tables, which is for expansion (including REHASHIDX also for expansion). Dicttype stores the properties of a hash table. Redis also implements iterators for Dict (so that looks like C + + code).
  
The specific implementation of the hash table is similar to the MC, but also uses the open chain method to resolve the conflict, but there are some small tricks. For example, using Dicttype to store function pointers, you can dynamically configure how the elements inside the bucket are manipulated. Another example of dictht saved in the Sizemask take size (the number of barrels)-1, with it and key to do & operation to replace the rest operation, speed and so on. On the whole, there are two hash tables in the dict, each of which stores the dictentry linked list in the bucket, Dictentry stores the specific key and value.
  
  As mentioned earlier, a dict for two dictht, is to expand (in fact, there is a shrinking capacity). Normal time, dict only use dictht[0], when the number of entry in Dict[0] has reached a certain proportion of the number of barrels, it will trigger the expansion and contraction operation, we collectively referred to as rehash, at this time, for dictht[1] after the size of the application rehash memory, Then the data in the dictht[0] is moved into the dictht[1], and the REHASHIDX record the number of barrels currently moved, when all the barrels are finished, rehash completed, then dictht[1] into dictht[0], the original dictht[ 0] becomes dictht[1], and becomes null. Unlike memcached, there is no need to open a background thread to do it, but it is done in the event loop, and rehash is not done at once, but divided into multiple times, each time the user operates Dict, Redis moves the data of a bucket until rehash is complete. In this way, the movement is divided into small mobile completion, the rehash of the time to divide the cost of the user each operation, so as to avoid the user a request to lead to rehash time, need to wait a long time, until rehash completed to return the situation. However, during rehash, each operation slowed down, and the user was unaware that Redis had added mobile data to the middle of his request, feeling that Redis was too cheap:-D hash Table related properties additional information two hash tables, sub primary and secondary, used to expand the location of the current data migration record, The number of iterators currently in use when expanding is item, the list of multiple item forms in the hash bucket, and table is a pointer to an array of multiple linked header pointers This is the number of buckets to take size-1, and then a data to come, by calculating the HashKey, Let HashKey & Sizemask determine the position of the bucket it will be placed in//when size takes 2^n, Sizemask is 1 ... 111, this will have the same effect as the HashKey% size, but it will be much quicker to use &. This is the reason for the comparison between the replication methods of the method that has the number of dictentry numbers of methods and the destruction of the method of Destruction} V; with Dict, the database is ready for implementation. All data read stored in Dict, key is stored as a key in Dictentry (string), with void* pointing to a Redis object, which can be any of 5 types. For example, the structure is like this, but the diagram is obsolete, and there are some areas that are incompatible with redis3.0.
  
Comparison of two database cache system implementations comparison of two database cache systems the object of type in 5, each of which has at least two underlying implementations. There are 3 types of string: Redis_encoding_raw, Redis_enciding_int, redis_encoding_embstr, List: Normal two-way linked list and compression linked list, compact chain list simply said, is to talk about the array transformation into a linked list, Continuous space, and then by storing the size of the string information to simulate the linked list, relative to the ordinary linked list can save space, but there are side effects, because it is a continuous space, so change the memory size, it needs to be reassigned, and because the byte size of the string is saved, All may cause continuous updates (see the Code for details). Set has Dict and intset (all integers are used to store it), sorted set has: Skiplist and Ziplist, Hashtable implements a compression list and dict and ziplist. Skiplist is a jumping table, it has close to the efficiency of the red and black trees, but the implementation is much simpler than the red and black trees, so it is used (strange, here do not build wheels, because this wheel is a bit difficult?) )。
  
Can be implemented using Dict, in Dict, each Dictentry key holds the key (which is the key of the value pair in the hash table), and value holds value, both of which are string. The dict in set, in each dictentry, holds the value of a specific element in the set, and value is null. The Zset (ordered set) in the figure is wrong, zset use skiplist and ziplist implementation, first skiplist very good understanding, it as a red and black tree substitutes on the line, and red black tree, it can also be sorted.
  
How to store Zset with Ziplist? First, in Zset, the elements in each set have a score score, which is used to sort them. So in Ziplist, according to the value of the size, first save the element, then save its score, then save the next element, and then score. This allows for continuous storage, so when inserting or deleting, you need to reallocate memory. So when an element exceeds a certain number, or the number of characters in an element exceeds a certain number, Redis chooses to use Skiplist to implement Zset (if Ziplist is currently used, the data in this ziplist will be taken out and stored in a new skiplist, Then delete the ziplist, which is the underlying implementation transformation, and the remaining types of Redis object are convertible.
  
Also, how does ziplist implement Hashtable? It is also very simple to store a key, store a value, store a key, and store a value. or sequential storage, similar to the Zset implementation, so when an element exceeds a certain number, or the number of characters of an element exceeds a certain number, it is converted to Hashtable. Various underlying implementations are convertible, and Redis can choose the most appropriate implementation based on the situation, which is also the benefit of using a similar object-oriented implementation approach.
  
It should be noted that the use of skiplist to achieve zset, in fact, also used a dict, this dict store the same key value pairs. Why is it? Because Skiplist's lookup is only LGN (possibly N), and dict can go to O (1), a dict is used to speed up the lookup because Skiplist and dict can point to the same Redis object, so there is not much memory wasted. In addition to using ziplist to achieve zset, why not dict to speed up the search? Because the number of elements supported by Ziplist is very small (the number is converted to skiplist), the sequential traversal is also very fast, so it is not necessary to dict.
  
In this case, the above Dict,dicttype,dictht,dictentry,redis object is very important, and they work together to achieve a flexible and efficient database with object-oriented color. I have to say, the design of the Redis database is still very powerful.
  
Unlike memcached, Redis has more than one database, with 16 defaults, number 0-15. The client can choose which database to use, using database number No. 0 by default. Different database data are not shared, that is, the same key can exist in different databases, but in the same database, key must be unique.
  
Also support expire time settings, we look at the above object, there is no save expire field, then how to record the data expire time? is for each database added a dict, this dict called expire dict, it inside the dict entry inside the key is the number of keys, and value is all the data is 64-bit int object, this int is expire time. This way, to determine whether a key is out of date, go to expire dict inside find it, take out expire time than the current times can. Why did you do it? Because not all keys will set the expiration time, so for a key that does not set expire times, saving a expire time will waste space, but with expire dict to save it separately, you can use the memory flexibly as needed (when the key expires, Will remove it from the expire dict).
  
What is the mechanism of expire? Similar to memcahed, it is also lazy to delete, that is, to use the data, first check whether the key is expired, expired is deleted, and then return an error. Simply by lazy Delete, said above may lead to memory waste, so there are supplementary solutions, there is a timed function, called Servercron, which is to maintain the function of the server, in it, the outdated data will be deleted, attention is not the total deletion, but in a certain period of time, The data in the expire dict of each database is randomly selected, if it expires, it is deleted, otherwise it will be selected until the specified time. That is, random selection of outdated data deletion, the operation of the time divided into two, a longer, a shorter, generally perform a short time of deletion, every certain time, the execution of a long time delete. This can effectively alleviate the problem of memory waste caused by the lazy deletion of light.
  
The above is the implementation of REDIS data, unlike memcached, Redis also supports data persistence, this is described below.
  
The biggest difference between database persistence and memcached is supporting data persistence, which is the biggest reason many people choose to use instead of memcached. Persistence is divided into two strategies that users can configure using different policies.
  
The RDB persistence action is triggered when persistent user executes save or bgsave. The core idea of the RDB persistence operation is to keep the database intact in the file.
  
How do you store it? For example, first stores a Redis string, plays the role of validation, represents an Rdb file, then saves the version information of Redis, then the specific database, then stores the Terminator EOF, and finally uses the test and. The key is databases, look at its name also know, it stores a number of databases, the database is stored in numbered order, NO. 0 database is stored, only turn to 1, then 2, until the last database.
  
Comparison of two large database cache system implementations comparison of two database cache systems each database is stored in the following way, first a 1-byte constant Selectdb, which indicates that the DB is switched, and then the number of the database is changed, and its length is variable. And then there's the exact data for the Key-value.
  
Comparison of two large database cache system implementations comparison of two large database cache systems {}} by the above code can also be seen, when stored, first check expire time, if it has expired, do not save on the line, otherwise, will expire time to save, note, timely storage expire Time is also the type that stores it first Redis_rdb_opcode_expiretime_ms, and then stores the specific expiration times. Next, store the true Key-value pair, store the type of value first, then store the key (which is stored as a string), and then store value, such as.
  
Comparison of two large database cache system implementations comparison of two large database cache systems in Rdbsaveobject, it is stored in different ways depending on the different types of Val, but ultimately it is converted to string storage, for example Val is a linklist, Then the entire list of bytes is stored, then the list is traversed, the data is fetched, followed by string to write the file. For hash table, the number of bytes is calculated first, then the dictentry in the hash table is taken, the key and value are stored as a string, and then the next dictentry is stored. In summary, the RDB is stored in a Key-value pair that stores expire time (if any), then the type of value, then stores the key (string), and then converts the value to a string store based on the type of value and the underlying implementation. In order to achieve data compression, and the ability to recover data based on files, Redis uses a lot of coding skills, some of which I do not understand, but the key is to understand the idea, do not care about these details.
  
The Rdb file is saved and the database is recovered according to the Rdb file when Redis is restarted. As well as the number of the database stored in the Rdb file, and the Key-value pair it contains, and the specific type, implementation, and data of the value in each key-value pair, Redis simply reads the file sequentially and then restores the object. Since the expire time is saved, the current period is already greater than expire, meaning that the data has timed out, then the key-value is not restored.
  
Saving an Rdb file is a huge project, so Redis also provides a mechanism for background saving. That is, when the bgsave is executed, Redis fork out a child process to allow the child process to perform the saved work, while the parent process continues to provide the Redis normal database service. Because the child process replicates the address space of the parent process, which is the database where the child process has the parent process fork, the child process performs the save operation and writes a temp file to the database that it inherits from the parent process. During child process replication, Redis records the number of times the database was modified (dirty). When the child process is finished, sent to the parent process SIGUSR1 signal, the parent process captures this signal, it knows that the child process has completed the replication, and then the parent process will save the child process to rename the temp file to a real Rdb file (that is, the actual save succeeded to change to the target file, this is the practice of insurance). Then record the end time of the save.
  
Here is a problem where the parent process's database has been modified while the child process is being saved, and the parent process has only recorded the number of changes (dirty) and has not been remediated. It seems that the RDB does not keep the database in real time, a little bit too big on the look. However, the following aof to be introduced to the persistence of this problem solved.
  
In addition to the customer executing the Sava or Bgsave command, you can configure the RDB save condition. That is, in the configuration file configuration, in the t time, the database has been modified dirty times, the background is saved. Redis in serve Cron, will be based on the number of dirty and the last saved time, to determine whether it meets the conditions, according to the conditions, the BG save, note that at any time there can only be a child process to the background save, because the save is a very cost IO operation, More than one process a lot of Io is inefficient and poorly managed.
  
Persistence first of all think of a problem, to save the database must be like an RDB to save all the data in the database? Is there any other way?
  
The RDB saves only the final database, which is a result. How did the results come about? is created by the user's commands, so it is possible to save the results without saving the result, but only the command that created the result. The aof of Redis is the idea, it's different. The RDB holds the DB data, which holds a command to create a database.
  
We first look at the format of the aof file, it is stored in a single command, the first to store command length, and then store the command, the specific delimiter what can be done in-depth research, this is not the focus, anyway know that the aof file is stored by the Redis client command to execute.
  
There is an SDS aof_buf, if AOF is persisted, each command that modifies the database is stored in this aof_buf (a string that holds the command format in the AoF file), and then the event loop does not loop once, in the server Cron calls Flushaofbuf, writes the commands in the aof_buf to the AoF file (actually write, the kernel buffer is actually written), and then empties the aof_buf into the next loop. So all the changes of the database can be restored by the command in the AoF file, to achieve the effect of saving the database.
  
Note that the write that is called in flushaofbuf, which simply writes the data to the kernel buffer, is actually written to the kernel itself and may need to be delayed for some time. However, Redis supports configuration, you can configure sync after each write, then call sync inside the Redis and write the data in the kernel to the file, which is a time consuming system call. You can also configure the policy to sync for 1 seconds, and Redis will open a background thread (so Redis is not a single thread, just a eventloop), and the background thread will call sync every second. Here's the question, why didn't you think about sync when you had an RDB? Because the RDB is a one-time storage, not as many times as AOF, the RDB when the call sync is not affected, and when using BG save, the child process will exit (exit), the Exit function will flush the buffer, automatically write to the file.
  
Again, you can use AOF persistence if you don't want to use AOF_BUF to save each modification command. Redis provides aof_rewrite, which generates commands based on an existing database, and then writes the commands to the AoF file. Very peculiar, isn't it? Yes, that's the way it is. When making aof_rewrite, Redis variables each database and then generates different commands based on the specific type of value in the Key-value pair, such as list, it generates a command to save the list, which contains the data needed to save the list. , if the list data is too long, it will be divided into multiple commands, first create the list, and then add elements to the list, in short, is based on data reverse generation of the command to save the data. Then store these commands in the AoF file so that you don't have the same effect as the AOF append?
  
Again, the AOF format also supports background mode. Execute aof_bgrewrite, also fork a child process, and then let the child process Aof_rewrite, the database it copies to write a temporary file, and then write a new number to notify the parent process. The parent process determines whether the child process's exit information is correct, and then renames the temporary file to the final aof file. Okay, here's the question. During child process persistence, it is possible that the database of the parent process is updated, how to notify the child process of the update? Do you want to use interprocess communication? Is it a bit of a hassle? What do you think Redis did? It does not notify the child process at all.
  
What, no notice? What about the update? During the child process execution Aof_bgrewrite, the parent process saves all commands (increment, delete, change, etc.) that have changed operations on the database, keeping them in aof_rewrite_buf_blocks, which is a linked list, each block can save the command, when not saved, The new application block, and then put in the back of the list, when the child process notification is finished saving, the parent process aof_rewrite_buf_blocks command append into the aof file. What a beautiful design, think of their own at the outset also consider the process of communication, others directly with the simplest method of the perfect solution to the problem, there is a saying true, the more excellent design tends to be simple, and complex things are often unreliable.
  
As for the loading of the aof file, it is a single command that executes the aof file. However, given that these commands are the commands that the client sends to Redis, Redis simply generates a fake client that does not have a network connection to Redis and executes the commands directly. First of all, the fake client here, not the real client, but the information stored in the Redis client, there is a write and read buffer, it exists in the Redis server. So, for example, directly read into the AOF command, put in the client's read buffer, and then execute the client's command. This completes the loading of the aof file.
  
Create pseudo-client command is not empty) {//Get parameter information for a command argc, ...
  
Execution} The entire aof persistent design, personally considered quite wonderful. There are many places that deserve to be worshipped.
  
The other place is stronger than memcached, and it supports simple transactions. A transaction simply means merging several commands and executing all commands at once. For a relational database, the transaction also has a rollback mechanism, that is, either the transaction command is executed successfully, and one failure is rolled back to the state before the transaction executes. Rollback is not supported, and its transaction only guarantees that the command is executed sequentially, even if an error in the middle command continues to execute, so that it only supports simple transactions.
  
First look at the process of redis transaction execution. Execute the MULTI command first, indicate the start of the transaction, then enter the command to execute, and finally enter the exec execution transaction. When the Redis server receives the MULTI command, it sets the state of the corresponding client to Redis_multi, which indicates that the client is in the transaction phase, And in the client's multistate structure to maintain the details of the command of the transaction (of course, the first will check whether the command can be recognized, the wrong command will not be saved), that is, the number of commands and specific commands, when the EXEC command received, Redis sequentially executes the commands saved in the multistate, and then saves the return value of each command, and when there is a command error, Redis does not stop the transaction, but instead saves the error message and then continues to execute, after all the commands have been executed, Returns the return value of all commands to the customer together.
  
Why doesn't IT support rollback? The explanation appearing on the internet is due to the problem of the client program, so there is no need to roll back the server, at the same time, does not support rollback, the server runs much more efficiently. In my opinion, the transaction is not the traditional relational database transaction, requires ciad so very strict, or the transaction is not a transaction, but provides a way, so that the client can execute more than one command at a time, the transaction as a normal command on the line, support rollback is not necessary.
  
Comparison of two database cache system implementations comparison of two large database cache systems www.hsl85.cn/We know that Redis is a single event loop, and when a thing is actually executed (that is, after Redis receives the EXEC command), the execution of the thing is not interrupted. All commands will be executed in an event loop. However, when a user enters a command for a transaction individually, there may be problems with other clients modifying the data used in the transaction.
  
So Redis also provides the watch command, which allows the user to perform a watch command before entering multi, specifying the data to be observed, so that if there are other clients that have modified the watch data before exec, Execution of the command to process the modified data will fail, prompting the data to be dirty. How is this achieved? In each of the redisdb there is a dict watched_keys,watched_kesy in the Dictentry key is the Watch database key, and value is a list, It is the client that stores watch.
  
At the same time, each client also has a watched_keys that holds the key to the client's current watch. When performing watch, Redis finds this key in the Watched_keys of the corresponding database (if not, creates a new Dictentry), and then joins the client in its client list, while the Watched_ to the client Add this key to the keys. When a customer executes a command to modify data, Redis first finds the key in Watched_keys, if it finds it, proves that there is a client in watch it, then iterates through all watch's client and sets the client to Redis_ Dirty_cas, the key with watch on the surface was DIRTY.
  
When the client executes the transaction, first checks whether the Redis_dirty_cas is set, if it is, the data is DIRTY, the transaction cannot be executed, the error is returned immediately, and the transaction can only be executed if the client is not set Redis_dirty_cas. It should be noted that after exec executes, all watch keys for that client are cleared, and the client list of that key in DB clears the client, that is, after exec is executed, the client is no longer wat www.yunfanfei.cn Ch any key (even if exec did not execute successfully). So Redis's transaction is a simple transaction, not a real transaction.
  
The above is the Redis transaction, the sense of implementation is very simple, the actual use is not too big.
  
The subscription channel support channel, that is, join a channel user is equivalent to join a group, the customer sent to the channel information, all the clients on the channel can receive.
  
The implementation is also very simple, also watch_keys implementation is similar, Redis server saved a pubsub_channels dict, inside the key is the name of the channel (obviously to be unique), value is a linked list, save the client joined the channel. At the same time, each client has a pubsub_channels that keeps its own channel of interest. When using the user to send messages to the channel, first in the server pubsub_channels to find the channel, and then traverse the client, send them messages. and subscription, unsubscribe channel is not enough to operate pubsub_channels just, very good understanding.
  
Also, Redis supports modal channels. That is, through the regular matching channel, such as the mode channel p, 1, to the ordinary channel P1 send messages, will match the p,1, in addition to the ordinary channel messages, but also to p,1 mode channel client sent messages www.saiche55.cn. Note that this is used to match the existing mode channel with the normal channel in the Publish command, instead of making the pattern channel in the Publish command, and then matching the channel saved in the Redis.
  
The implementation is also very simple, there is a pubsub_patterns list in Redis server (why not dict here?). Because the number of pubsub_patterns is generally less, do not need to use dict, simple list is good), it is stored in the Pubsubpattern structure, which is the mode and client information, as shown below, a pattern, a client, So if there are multiple Clint listening to a pubsub_patterns www.yxin7.com, there will be multiple pubsubpattern on the list face, saving the correspondence between the client and the Pubsub_patterns. At the same time, in the client, there is also a pubsub_patterns list, but it is stored in the list of Pubsub_patterns it listens to (that is, SDS), rather than the pubsubpattern structure.
  
Listening mode when a user sends a message to a www.chuangshi88.cn/channel, it first finds the channel in Pubsub_channels in Redis server and then sends a message to its customer list. After www.lxinyul.cc/, the pubsub_patterns inside the Redis server looks for a matching pattern, and then sends a message to the client. There is no duplication of customers here, Pubsub_channels may have sent a message to a client, and then it may be sent again (or even more times) to the user in Pubsub_patterns. It is estimated that Redis considers this a problem for the client program itself, so it is not processed.
  
}}}}} six. In summary, Redis is much more versatile than memcached and more complex to implement. But memcached is more focused on saving key-value data (which is already sufficient for most usage scenarios), and Redis provides richer data structures and other features. Www.boyuanyl.cn can not say that Redis is better than memcached, but from the source reading point of view, the value of Redis may be a bit bigger. In addition, the redis3.0 inside support the cluster function, this part of the code has not been studied, follow-up.

Comparison of two large database cache system implementations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.