Redis Learning Note 9--redis Persistence

Source: Internet
Author: User
Tags flush hash memory usage redis sha1 sha1 hash redis server

Redis is an in-memory database that supports persistence, which means that Redis often needs to synchronize in-memory data to disk to ensure persistence. Redis supports four persistence methods, one is snapshotting (snapshot) is the default mode, the second is append-only file (abbreviated AOF) way, the third is the virtual memory mode, four is the Diskstore way. The following are described separately.

(a) snapshotting

Snapshots are the default persistence mode. This is the way in which the in-memory data is written to the binary in a snapshot, the default file name is Dump.rdb. You can automatically make snapshot persistence by configuring settings. We can configure Redis to take snapshots automatically if more than M key is modified in n seconds, the following is the default snapshot save configuration:

Save 1 #900秒内如果超过1个key被修改 to initiate a snapshot save
Save #300秒内容如超过10个key被修改, the snapshot save is initiated
Save 60 10000


Snapshot save process:

1. Redis calls fork and now has child and parent processes.
2. The parent process continues to process the client request, which is responsible for writing the memory contents to the temporary file. Because the OS's write-time replication mechanism (copy on write) will share the same physical page, when the parent process processes the write request, the OS creates a copy of the page to be modified by the parent process, rather than writing the shared page. So the data in the child process's address space is a snapshot of the entire database at fork time.
3. After the child process writes the snapshot to the temporary file, replaces the original snapshot file with the temporary file, and then the child process exits (the fork is also copied into the internal process, that is, the memory is twice times the original).

The client can also use the Save or Bgsave command to notify Redis to do a snapshot persistence. The save action is to save the snapshot in the main thread, which blocks all client requests because Redis uses a main thread to process all client requests. So it is not recommended. It is also important to note that each snapshot persistence is a full write of the memory data to disk once, not the incremental synchronization of only dirty data. If the amount of data is large, and more write operations, it will inevitably cause a large number of disk IO operations, may seriously affect performance.
In addition, because the snapshot is done at a certain interval, if Redis is accidentally down, all changes after the last snapshot will be lost. You can use AOF persistence if your application requires that you cannot lose any modifications. Here are the following:

(ii) append-only file

AOF is more persistent than snapshot mode, because Redis appends each received write command to a file by using the Write function (default is appendonly.aof) when aof persistence is used. When Redis restarts, the contents of the entire database are rebuilt in memory by re-executing the write commands saved in the file. of course, because the OS caches write modifications in the kernel, it may not be written to disk immediately. The persistence of this aof method is also likely to lose some of the modifications. But we can tell Redis through the configuration file that we want to force the OS to write to disk through the Fsync function. There are three ways to do this (the default is: Fsync once per second):

AppendOnly Yes #启用aof持久化方式
# Appendfsync always # every time you receive a write command, it is forced to write to disk immediately, the slowest, but guaranteed full persistence, not recommended
Appendfsync everysec #每秒钟强制写入磁盘一次, a good compromise in performance and durability, recommended
# Appendfsync No #完全依赖os, best performance, no guarantee of durability

The AOF approach also poses another problem. Persistent files can become more and more large. For example, we call the INCR Test command 100 times, the file must save all 100 commands, in fact, 99 are redundant. Because you want to restore the state of the database, it is enough to save a set test 100 in the file. In order to compress the aof persistence file. Redis provides the bgrewriteaof command. Receive this command Redis will use a snapshot-like method to save the in-memory data to a temporary file in the form of a command, and finally replace the original file. The process is as follows:

1. Redis Call Fork, now has a parent-child two processes
2. The child process writes to the temporary file the command to rebuild the database state based on the database snapshot in memory
3. The parent process continues to process the client request, in addition to writing the write command to the original aof file. Cache the received write commands at the same time. This will ensure that if the child process rewrite fails, it will not be problematic.
4. When a child process writes the snapshot content to a temporary file, the child process signals the parent process. The parent process then writes the cached write command to the temporary file as well.
5. Now the parent process can replace the old aof file with the temporary file and rename it, and the subsequent write commands are also started to append to the new aof file.

Note that the operation to rewrite the aof file does not read the old aof file, but instead overwrites the entire in-memory database content with a new aof file in the form of a command, which is a bit similar to a snapshot.

(iii) virtual memory mode (desprecated)

First of all: the virtual memory function after Redis-2.4 has been deprecated for the following reasons:

1) Slow restart reboot too slow

2) Slow saving save data too slow

3) Slow replication above two leads to replication too slow

4) Complex code codes are too complex

The following is a brief introduction to Redis's virtual memory.

Redis's virtual memory is not matter with the OS's virtual memory, but the idea and purpose are the same. is to temporarily swap infrequently accessed data from memory to disk, freeing up valuable memory for other data that needs to be accessed. Especially for memory databases such as Redis, memory is never enough. In addition to splitting the data out to multiple Redis servers. Another way to increase the capacity of a database is to use VMS to swap data that is infrequently accessed on the disk. If our stored data is always accessed with a small portion of the data, most of the data is rarely accessed, and it is true that only a small number of users are often active on the site. When a small amount of data is accessed frequently, the use of VMs not only increases the capacity of a single Redis server database, but also does not affect performance too much.

Redis does not use the virtual memory mechanism provided by the OS but instead implements its own virtual memory mechanism in the user state, which the author explains in his own blog.

Http://antirez.com/post/redis-virtual-memory-story.html
Main reasons are two points:
        1. The OS's virtual memory is a 4k page that has been swapped for the smallest unit. Most of the Redis objects are much smaller than 4k, so there may be multiple Redis objects on an OS page. In addition, the Redis collection object types such as List,set may exist with multiple OS pages. It may eventually cause only 10%key to be accessed frequently, but all OS pages will be considered active by the OS so that the OS will swap pages only if the memory is really exhausted.
       2. Exchange Mode compared to the OS. Redis can compress objects that are swapped to disk, and objects saved to disk can remove pointers and object metadata information. Generally compressed objects are 10 times times smaller than objects in memory. This allows Redis VMs to do much less IO operations than OS VMS.

       The following are VM-related configurations:

Slaveof 192.168.1.1 6379 #指定master的ip和端口

vm-enabled Yes #开启vm功能
Vm-swap-file/tmp/redis.swap #交换出来的value保存的文件路径/tmp/redis.swap
Vm-max-memory 1000000 #redis使用的最大内存上限, after the upper limit, Redis starts exchanging value to the disk file
Vm-page-size #每个页面的大小32个字节
Vm-pages 134217728 #最多使用在文件中使用多少页面, size of swap file = Vm-page-size * vm-pages
Vm-max-threads 4 #用于执行value对象换入换出的工作线程数量, 0 means no worker threads are used (described later)

The Redis VMs are designed to ensure that the key is searched, and only the value is swapped into the swap file. So if the memory problem is caused by too many value-small keys, then the VM is not resolved. Like the OS, Redis also swaps objects by page. REDIS specifies that only one object can be saved on the same page. However, an object can be saved on multiple pages.

No value is exchanged until the memory used by Redis does not exceed vm-max-memory. When the maximum memory limit is exceeded, Redis chooses older objects. If two objects are as old as the older objects, the exact formula Swappability = Age*log (size_in_memory) is the preferred interchange for the larger object. For vm-page-size settings, you should set the size of the page to fit the size of most objects according to your app. Too big to waste disk space, too small will cause the swap file to be fragmented. For each page in the swap file, Redis corresponds to a 1bit value in memory to record the idle state of the page. So like the number of pages in the configuration above (Vm-pages 134217728) consumes 16M of memory to record the page's idle state. Vm-max-threads represents the number of threads used to swap tasks. If greater than 0 is recommended, set the number of CPU cores to the server. If it is 0, the exchange process takes place on the main thread.

After the parameter configuration is discussed, here's a brief introduction to how the VM works:
when Vm-max-threads is set to 0 o'clock (Blocking VM)

swap out:
The main thread periodically checks that the memory exceeds the maximum limit, is directly blocked, saves the selected object to the swap file, and frees the memory used by the object, and the process repeats until the following conditions are met
1. Memory usage drops below maximum limit
The 2.swap file is full.
3. Almost all of the objects have been swapped to disk.
swap in:
When there is a client requesting the value to be swapped out of key. The main thread loads the corresponding value object from the file in a blocking manner, which blocks all clients at the time of loading. Then process the client request

when Vm-max-threads is greater than 0 (threaded VM)
swap out:
When the main thread detects that the use of memory exceeds the maximum limit, the selected object information to be exchanged is placed in a queue for background processing by the worker thread, and the main thread continues to process the client request.
swap in:
If a client request key is swapped out, the main thread first blocks the client that issued the command, and then puts the loaded object's information into a queue for the worker to load. A worker thread notifies the main thread after loading is complete. The main thread then executes the client's command. This method only blocks the client that requested value to be swapped out of key

Overall, the overall performance of blocking VMs is better, as there is no need for thread synchronization, creating threads, and recovering blocked client overhead. But the response is also sacrificed accordingly. The way the threaded VM is the main thread does not block on disk IO, so responsiveness is better. It is recommended to use the blocking VM if our application does not often change in and out and does not care about a bit of delay.

For a more detailed introduction to Redis VMS, refer to the following links:
Http://antirez.com/post/redis-virtual-memory-story.html
Http://redis.io/topics/internals-vm

(d) Diskstore Way

Diskstore mode is a new way of implementation that the author abandons the virtual memory mode, which is the traditional way of b-tree. Specific details are:

1) Read operation, using read through and LRU mode. Data that does not exist in memory is pulled from the disk and put into memory, and data that is not in memory is used for LRU elimination.

2) write operation, the use of another spawn thread alone processing, the write thread is usually asynchronous, of course, you can also set the Cache-flush-delay configuration 0,redis as far as possible to ensure instant write. But in many cases delay writing will have better performance, such as some counters with Redis storage, in a short time if a count is repeatedly modified, Redis only need to write the final results to disk. The author of this procedure is called per key persistence. Because writes are merged by Key, there is still a difference between snapshot and disk store, which does not guarantee time consistency.

Because the write operation is single-threaded, even if the cache-flush-delay is set to 0, multiple client writes at the same time will need to be queued and the caller will be stuck if the queue capacity exceeds cache-max-memory Redis design goes into a wait state.

Google Group on the enthusiastic netizen quickly completed the stress test, when the memory is exhausted, set per second processing speed from 25k to 10k and then almost stuck. While the same key repeat write performance can be increased by increasing the cache-flush-delay, temporary peak writes can be addressed by increasing the cache-max-memory. But the Diskstore write bottleneck is ultimately in IO.

3) RDB and new Diskstore format relationship
An RDB is a storage format for traditional Redis memory, and Diskstore is a different format, which is how they relate.

· The Diskstore format can be saved as an RDB format at any time through Bgsave, and the RDB format is also used for Redis replication and intermediate formats between different storage modes.

· The RDB format can be converted to Diskstore format by using the tool.

Of course, the diskstore principle is very good, but is still in the alpha version, but also just a simple demo,diskstore.c annotated only 300 lines, the way to do this is to save each value as a separate file, the file name is the hash value of key. Therefore diskstore need a more efficient and stable implementation in the future to be used in the production environment. But thanks to the clear interface design, DISKSTORE.C is also easily replaced by a b-tree implementation. Many developers are also actively exploring the feasibility of using BDB or InnoDB to replace the default diskstore.c.

Here's an introduction to the Diskstore algorithm.

In fact, Diskstore is similar to the hash algorithm, first through the SHA1 algorithm to convert the key into a 40-character hash value, and then the hash value of the first two bits as a directory, and then the hash value of three or four bits as a two-level directory, and finally the hash value as a file name, similar to "/ 0b/ee/0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33 "form. The algorithm is as follows:

Dskeytopath (Key):

Char path[1024];

Char *hashkey = SHA1 (key);

Path[0] = hashkey[0];

PATH[1] = hashkey[1];

PATH[2] = '/';

PATH[3] = hashkey[2];

PATH[4] = hashkey[3];

PATH[5] = '/';

memcpy (path + 6, HashKey, 40);

return path;

Storage algorithms (such as key = = Apple):

Dsset (key, Value, Expiretime):

d0be2dc421be4fcd0172e5afceea3970e2f3d940

Char *hashkey = SHA1 (key);

d0/be/d0be2dc421be4fcd0172e5afceea3970e2f3d940

Char *path = Dskeytopath (HashKey);

FILE *FP = fopen (Path, "w");

Rdbsavekeyvaluepair (FP, key, value, Expiretime);

Fclose (FP)

Get algorithm:

DsGet (Key):

Char *hashkey = SHA1 (key);

Char *path = Dskeytopath (HashKey);

FILE *FP = fopen (path, "R");

RobJ *val = rdbloadobject (FP);

return Val;

However, Diskstore has a drawback, is that it is possible to have two different keys to generate an identical SHA1 hash value, so there is a possibility of loss of data problems. However, this situation is less likely to occur, so it is acceptable. Depending on the author's intentions, B+tree may be used in the future to replace this highly dependent file system implementation.

can also see: http://www.hoterran.info/redis_persistence

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.