Redis is a persistent memory database, which means that Redis often synchronizes data in memory to disk to ensure persistence. Redis supports four kinds of persistence methods, one is snapshotting (snapshot) is the default mode, the other is Append-only file (abbreviated AOF), the third is virtual memory mode, and four is Diskstore way. The following are described separately.
(i) snapshotting
Snapshots are the default persistence method. In this way, the memory data is written to the binary file as a snapshot, and the default file name is Dump.rdb. You can automatically do snapshot persistence through configuration settings. We can configure Redis to take snapshots automatically if more than M keys are modified in n seconds, following is the default snapshot save configuration:
Save 900 1 #900秒内如果超过1个key被修改, a snapshot save is initiated
Save #300秒内容如超过10个key被修改, the snapshot is saved
Save 60 10000
Snapshot save process:
1. Redis calls fork and now has child processes and parent processes.
2. The parent process continues to process the client request, which is responsible for writing the memory contents to the temporary file. Because the OS's write-time replication mechanism (copy on write) parent-child processes share the same physical page, the OS creates a copy of the page that the parent process is modifying, rather than a shared page, when the parent process processes the write request. So the data in the address space of the child process is a snapshot of the entire database at fork time.
3. When the child processes write the snapshot to the temporary file, replace the original snapshot file with the temporary file, and then the subprocess exits (fork a process into the inside is replicated, that is, the memory will be twice times the original).
The client can also use the Save or Bgsave command to notify Redis to do a snapshot persistence. The save operation saves the snapshot in the main thread, which blocks all client requests because Redis is a main thread to handle all client requests. So it is not recommended to use. Another point to note is that each snapshot persistence is a complete write of the memory data to the disk, not an incremental synchronization of dirty data only. If the amount of data is large, and more write operations, will inevitably cause a lot of disk IO operations, may seriously affect performance.
In addition, because the snapshot is done at a certain interval, if Redis accidentally down, all changes after the last snapshot are lost. If the application requirements can not be lost any changes, the use of aof persistence. Here is a description:
(ii) append-only file
AOF is more persistent than snapshot, because when you use the AoF persistence method, Redis appends each received write command to the file (by default, appendonly.aof) through the Write function. When Redis restarts, the contents of the entire database are rebuilt in memory by executing the write commands saved in the file. Of course, because the OS caches write modifications in the kernel, it may not be written to disk immediately. In this way the persistence of aof is also likely to lose some of the changes. But we can tell Redis through the configuration file that we want to force the OS to write to disk via the Fsync function. There are three different ways (by default: Fsync per second):
AppendOnly Yes #启用aof持久化方式
# Appendfsync always #每次收到写命令就立即强制写入磁盘, slowest, but guaranteed to be completely persistent, deprecated
Appendfsync everysec #每秒钟强制写入磁盘一次, a good compromise in performance and durability, recommended
# Appendfsync No #完全依赖os, best performance, no guarantee of permanence
The AOF approach also brings up another problem. The persisted file will become larger and bigger. For example, we call the INCR Test command 100 times, the file must save all 100 commands, in fact, 99 are redundant. Because you want to restore the state of the database in fact, a set test 100 is enough to save the file. To compress the persisted files of the aof. Redis provides the bgrewriteaof command. Receive this command Redis will save the data in memory to a temporary file in a way that is similar to the snapshot, and finally replace the original file. The specific process is as follows:
1. Redis calls Fork, now has a parent-child two process
2. The child process writes a command to rebuild the database state to a temporary file based on a database snapshot in memory
3. The parent process continues to process the client request, in addition to writing the write command to the original aof file. At the same time, the received write commands are cached. This ensures that if the child process fails to rewrite, it will not be a problem.
4. When the child process writes the snapshot contents to a temporary file in a command way, the child processes signal the parent process. The parent process then writes the cached write command to the temporary file.
5. Now the parent process can replace the old aof file with the temporary file and rename it, and the write commands that are received later are also appended to the new AoF file.
Note that rewriting the AoF file does not read the old aof file, but instead rewrites the entire memory database content with a command to rewrite a new aof file, which is somewhat similar to the snapshot.
(iii) Virtual memory mode (desprecated)
First, the virtual memory function has been deprecated after Redis-2.4, for the following reasons:
1) Slow restart reboot too slow
2) Slow saving save data too slowly
3 Slow replication above two causes replication too slow
4) Complex code codes are too complex
Here's a brief introduction to Redis's virtual memory.
Redis virtual memory is not the same as the OS's virtual memory, but the idea and purpose are the same. is to temporarily switch infrequently accessed data from memory to disk, freeing up valuable memory space for other data that needs to be accessed. Especially for memory databases such as Redis, memory is always insufficient. In addition to dividing the data into multiple Redis servers. Another way to improve database capacity is to use VMS to swap infrequently accessed data on disks. If our stored data always has a small amount of data being accessed frequently, most of the data is rarely accessed, and it is true that only a small number of users are always active on the site. When a small amount of data is accessed frequently, using a VM not only improves the capacity of a single Redis server database, but also does not cause too much impact on performance.
Redis did not use the virtual memory mechanism provided by the OS but instead implemented its own virtual memory mechanism in the user state, the author explains the reason in his blog.
Http://antirez.com/post/redis-virtual-memory-story.html
The main reasons are two points:
1. The virtual memory of the OS has been exchanged for the smallest unit in the 4k page. Most objects in Redis are far less than 4k, so there may be multiple Redis objects on one OS page. Other Redis collection object types such as List,set may exist on multiple OS pages. The end result may be that only 10%key are accessed frequently, but all OS pages are considered active by the OS, so that the OS will only swap pages if the memory is really depleted.
2. Compared to the OS Exchange Mode. Redis can compress objects that are exchanged to disk, and objects saved to disk can remove pointer and object meta data information. Generally compressed objects are 10 times times smaller than objects in memory. This Redis VM can do a lot less IO operations than an OS VM.
The following is a VM-related configuration:
Slaveof 192.168.1.1 6379 #指定master的ip和端口
vm-enabled Yes #开启vm功能
Vm-swap-file/tmp/redis.swap #交换出来的value保存的文件路径/tmp/redis.swap
Vm-max-memory 1000000 #redis使用的最大内存上限, Redis start swapping value to disk file after exceeding the upper limit
Vm-page-size #每个页面的大小32个字节
Vm-pages 134217728 #最多使用在文件中使用多少页面, the size of the swap file = Vm-page-size * vm-pages
Vm-max-threads 4 #用于执行value对象换入换出的工作线程数量, 0 means no worker threads are used (described later)
Redis's VM is designed to ensure key lookup speed, and will only exchange value to the swap file. So if the memory problem is due to too many value very small key, then the VM does not solve. As with the OS Redis is also the exchange of objects by page. Redis stipulates that only one object can be saved on the same page. However, an object can be saved in multiple pages.
No value is exchanged until the memory used by Redis is not more than vm-max-memory. When the maximum memory limit is exceeded, Redis selects older objects. If two objects are as old as the first to exchange larger objects, the exact formula Swappability = Age*log (size_in_memory). For vm-page-size settings, you should set the size of the page to accommodate the size of most objects, depending on your application. Too large to waste disk space, too small will cause the swap file fragmentation. For each page in the swap file, Redis corresponds to a 1bit value in memory to record the page's idle state. So the number of pages in the configuration above (Vm-pages 134217728) consumes 16M of memory to record the page's idle state. Vm-max-threads represents the number of threads used for a swap task. If greater than 0 is recommended to set the number of CPU core for the server. In the case of 0, the exchange process is performed on the main thread.
Parameter configuration after the discussion, let's briefly describe how the VM works:
When Vm-max-threads is set to 0 o'clock (Blocking VM)
Swap out:
The main thread periodically checks that the memory exceeds the maximum limit, is blocked directly, saves the selected object to the swap file, and frees the memory occupied by the object, which repeats until the following conditions are met
1. Memory usage drops below maximum limit
2.swap file full.
3. Almost all of the objects have been swapped to disk.
Swap into:
When a client requests value to be swapped out of the key. The main thread loads the corresponding value object from the file in a blocked manner, blocking all clients at the time of loading. And then process the client's request.
When Vm-max-threads is greater than 0 (threaded VM)
Swap out:
When the main thread detects the use of memory exceeding the maximum limit, the selected object information to be exchanged is placed in a queue to be processed by the worker thread, and the main thread continues to process the client request.
Swap into:
If a client-requested key is swapped out, the main thread blocks the client who issued the command, and then puts the loaded object's information in a queue so that the worker thread can load it. The worker thread notifies the main thread after the load has completed. The main thread then executes the client's command. This method only blocks the client of the request value being swapped out key
Overall, the overall performance of the blocking VM is better, as there is no need for thread synchronization, creating threads, and recovering blocked client overhead. But the response was sacrificed accordingly. Threaded VM way The main thread does not block on disk IO, so responsiveness is better. If our application does not change too often, and does not care too much about delays, it is recommended to use a blocking VM.
For a more detailed description of the Redis VM, refer to the links below:
Http://antirez.com/post/redis-virtual-memory-story.html
Http://redis.io/topics/internals-vm
(iv) Diskstore mode
The Diskstore mode is a new way of realization that the author chooses after abandoning the virtual memory mode, which is the traditional b-tree way. Specific details are:
1 read operation, use read through and LRU method. Data that does not exist in memory is pulled from disk and put into memory, and the data in memory is eliminated by LRU.
2 write operation, using the other spawn a single thread processing, write thread is usually asynchronous, of course, can also be configured to Cache-flush-delay configuration 0,redis as far as possible to ensure real-time writing. However, in many cases, delayed writing will have better performance, such as some counters with Redis storage, in a short period of time if a count is repeatedly modified, Redis only need to write the final results to disk. This practice is called per key persistence. Because writes are merged by Key, there is a difference with snapshot, and the disk store does not guarantee time consistency.
Because the write operation is single-threaded, even if the cache-flush-delay is set to 0, multiple client write at the same time need to wait in line, if the queue capacity exceeds Cache-max-memory Redis design will enter the waiting state, causing the caller to be stuck.
Google group enthusiastic netizens completed the stress test quickly, when the memory used up, set per second processing speed from 25k down to 10k and then almost stuck. Although the duplicate write performance of the same key can be improved by adding cache-flush-delay, the temporary peak write can be done by adding cache-max-memory. But the Diskstore write bottleneck is ultimately IO.
3) RDB and new Diskstore format relationship
RDB is the traditional Redis memory format, Diskstore is another format, and the relationship between the two.
· You can save the Diskstore format as a RDB format at any time by Bgsave, and the RDB format is also used for Redis replication and intermediate formats between different storage modes.
· Tools allow you to convert RDB formats into Diskstore formats.
Of course, the diskstore principle is very good, but is still in the alpha version, but also a simple demo,diskstore.c plus comments only 300 lines, the implementation of the method is to save each value as a separate file, file name is the hash value of the key. So Diskstore needs to have a more efficient and stable implementation in the future for the production environment. But with a clear interface design, diskstore.c can easily be replaced by a b-tree implementation. Many developers are also actively exploring the feasibility of using BDB or InnoDB to replace the default diskstore.c.
Here is an introduction to the Diskstore algorithm.
In fact, diskstore similar to the hash algorithm, first through the SHA1 algorithm to convert the key into a 40-character hash value, and then the hash value of the first two bits as a directory, and then the hash value of three or four bits as a two-level directory, and finally the hash value as file name, similar to "/ 0b/ee/0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33 "form. The algorithm is as follows:
Dskeytopath (Key):
Char path[1024];
Char *hashkey = SHA1 (key);
Path[0] = hashkey[0];
PATH[1] = hashkey[1];
PATH[2] = '/';
PATH[3] = hashkey[2];
PATH[4] = hashkey[3];
PATH[5] = '/';
memcpy (path + 6, HashKey, 40);
return path;
Storage algorithms (such as key = = Apple):
Dsset (key, Value, Expiretime):
d0be2dc421be4fcd0172e5afceea3970e2f3d940
Char *hashkey = SHA1 (key);
d0/be/d0be2dc421be4fcd0172e5afceea3970e2f3d940
Char *path = Dskeytopath (HashKey);
FILE *FP = fopen (Path, "w");
Rdbsavekeyvaluepair (FP, key, value, Expiretime);
Fclose (FP)
Get algorithm:
DsGet (Key):
Char *hashkey = SHA1 (key);
Char *path = Dskeytopath (HashKey);
FILE *FP = fopen (path, "R");
RobJ *val = rdbloadobject (FP);
return Val;
However, Diskstore has the disadvantage that it is possible to have two different keys to generate an identical SHA1 hash value, which may result in loss of data problems. But the odds of this happening are relatively low, so it's acceptable. According to the author's intention, the future may use B+tree to replace this highly dependent file system implementation method.
You can also see: http://www.hoterran.info/redis_persistence