Decrypt Redis Persistence

Source: Internet
Author: User

This article is based on the Redis author blog, and the Redis author says that all of the discussions he saw for Redis were the biggest misunderstanding of redis persistence, and he wrote a lengthy article to systematically discuss the persistence of Redis.

What is persistence is simply putting the data in a device that is not lost after a power outage. That's the hard drive we usually understand.

The process of writing operations

First of all, let's look at what the database did in the writing operation, there are five main processes.

    1. The client sends a write operation to the server (the data is in the client's memory)

    2. The database server receives the data for the write request (the data is in the server's memory)

    3. The server calls write (2), which writes the data to the disk (the data is in the buffer of the system memory)

    4. The operating system transfers data from the buffer to the disk controller (data is in the disk cache)

    5. The disk controller writes data to the physical media on the disk (the data actually falls on the disk)

Write operations roughly the above 5 processes, below we combine the above 5 processes to look at various levels of failure.

    • When the database system fails, this time the system kernel is OK, then as long as we finish the 3rd step, then the data is safe, because the subsequent operating system will be to complete the next few steps to ensure that the data will eventually fall on the disk.

    • When the system loses power, all of the caches mentioned in 5 above are invalidated and both the database and the operating system stop working. Therefore, only when the data in the completion of the 5th step, the machine power loss to ensure that the data is not lost, in the above four steps of the data will be lost.

From the above 5 steps, we may wish to clarify some of the following questions:

    • How long does the database call write (2) to write data to the kernel buffer

    • How long does the kernel write data from the system buffers to the disk controller

    • When does the disk controller write the data in the cache to the physical media?

For the first issue, the database level is usually fully controlled. For the second issue, the operating system has its default policy, but we can also force the operating system to write data from the kernel area to the disk controller via the Fsync Series command provided by the POSIX API. For the third problem, it seems that the database is inaccessible, but in fact, in most cases the disk cache is set to shut down. Or it is only turned on for read caching, which means that the write operation is not cached and written directly to the disk. The recommended approach is to turn on write caching only if your disk device has a backup battery.

The so-called data corruption, is the data can not be restored, the above we are talking about how to ensure that the data is actually written to disk, but written to disk may not mean that the data is not corrupted. For example, we may write a request two different writes, and when it happens, it may cause one write to complete safely, but not yet another. If the data file structure of the database is not properly organized, it may result in a situation where the data is completely unrecoverable.

There are also three strategies for organizing data to prevent data files from being corrupted to unrecoverable conditions:

    1. The first is the most coarse processing, that is, not through the organization of data to ensure the recoverability of data. Instead, a data backup is performed to restore data files after they are corrupted by configuring data to synchronize Backups. In fact MongoDB does not turn on the journaling log, which is the case when configuring replica sets.

    2. The other is to add an operation log on top of it, and to remember the behavior of the operation each time, so that we can use the operation log for data recovery. Because the operation log is written in sequential append mode, there is no case that the operation log will not be recoverable. This is similar to the case where MongoDB opened the journaling log.

    3. More insurance is that the database does not make changes to the old data, just append to the completion of the write operation, so that the data itself is a log, so that the data will never be able to recover the situation. In fact, COUCHDB is a good example of this approach.

RDB Snapshot

Let's talk about Redis's first persistence policy, an RDB snapshot. Redis supports persisting a snapshot of the current data into a data file. And how does a continuously written database generate a snapshot? Redis uses the copy on write mechanism of the fork command. When a snapshot is generated, the current process is forked out of a child process, and then all data is looped through the child process, and the data is written to an RDB file.

We can configure the timing of the RDB snapshot generation through the Redis save instruction, for example, you can configure a snapshot to be generated 100 times within 10 minutes, or you can configure a snapshot to be generated with 1000 writes within 1 hours, or you can implement multiple rules together. The definitions of these rules are in the Redis configuration file, and you can set the rules at Redis runtime with Redis's config set command, without having to restart Redis.

The Redis Rdb file does not break because its write operation is performed in a new process, and when a new Rdb file is generated, the Redis-generated subprocess writes the data to a temporary file and then renames the temporary file to an Rdb file by means of an atomic rename system call. In this way, Redis's RDB files are always available whenever a failure occurs.

At the same time, Redis's Rdb file is also a part of the Redis master-slave synchronization implementation.

However, we can obviously see that the RDB has his shortcomings, that is, once the database has a problem, then the data stored in our Rdb file is not entirely new, from the last Rdb file generation to the Redis outage time of the data are all discarded. In some businesses, this is tolerable, and we recommend that these services be persisted using an RDB, because the cost of opening an RDB is not high. But for other applications that have very high data security requirements that cannot tolerate data loss, the RDB is powerless, so Redis introduces another important persistence mechanism: AOF logs.

AOF Log

The full name of the AOF log is append only file, which we can see from the name that it is an append-write log file. Unlike the binlog of a general database, the AoF file is a plain, recognizable text, and its content is a Redis standard command. For example, we do the following experiment, using the Redis2.6 version, in the start command parameters to set the open aof function:

./redis-server--appendonly Yes

Then we execute the following command:

Redis 127.0.0.1:6379> Set Key1 hellookredis 127.0.0.1:6379> append key1 "world!" (integer) 12redis 127.0.0.1:6379> del key1 (integer) 1redis 127.0.0.1:6379> del non_existing_key (integer) 0

When we view the AoF log file, we will get the following content:

$ cat appendonly.aof*2$6select$10*3$3set$4key1$5hello*3$6append$4key1$7 World!*2$3del$4key1

As you can see, the write operation generates a corresponding command as a log. It is noteworthy that the last del command, which is not recorded in the AoF log, is because Redis determines that this command does not make changes to the current data set. So there's no need to record this useless write command. In addition, the AOF log is not completely on the client's request to generate the log, such as command incrbyfloat in the AOF log is recorded as a set record, because the floating-point operation may be different on different systems, so in order to avoid the same log on different systems to generate different datasets, Therefore, only the results of the operation are recorded by set.

AoF rewrite

You can think, every write command generates a log, then the aof file is not very large? The answer is yes, the AoF file will grow larger, so Redis provides a feature called AoF rewrite. Its function is to regenerate a copy of the AoF file, one record in the new AoF file is only once, and unlike an old file, multiple operations on the same value may be logged. Its build process is similar to an RDB, and it also fork a process, traversing the data directly, and writing a new aof temporary file. In the process of writing a new file, all of the write logs are still written to the old aof file and are also recorded in the memory buffer. When the completion of the operation completes, logs from all buffers are written to the temporary file once. Then call the atomic Rename command to replace the old aof file with the new AoF file.

From the above process we can see that both RDB and AOF operations are sequential IO operations with high performance. At the same time, when the database is restored through the Rdb file or the AOF log, the sequential read data is loaded into memory. So it does not cause random reads of the disk.

AOF Reliability Settings

AoF is a write-file operation that is intended to write the operation log to disk, so it will also encounter the 5 processes we have described above for the write operation. So how high is the operational security of writing AOF? In fact, this can be set, in Redis in the AOF call write (2) write, when the call Fsync write it to disk, through the Appendfsync option to control, the following Appendfsync three settings, the security strength gradually stronger.

Appendfsync No

When setting Appendfsync to No, Redis does not actively invoke Fsync to synchronize aof log content to disk, so it is entirely dependent on the debug of the operating system. For most Linux operating systems, Fsync is performed every 30 seconds, and the data in the buffer is written to disk.

Appendfsync everysec

When setting Appendfsync to Everysec, Redis will default to a Fsync call every second, writing data from the buffer to disk. However, this time when the Fsync call is longer than 1 seconds. Redis takes a deferred fsync policy and waits another second. That is, in two seconds after the Fsync, this time Fsync no matter how long it will be carried out. Because the file descriptor is blocked at Fsync, the current write operation is blocked. So, the conclusion is that in the vast majority of cases, Redis will be fsync every second. In the worst case, a fsync operation is performed in two seconds.

This operation is called Group commit in most database systems, which is the combination of multiple write operations and writes the logs to disk at once.

Appednfsync always

When the Appendfsync is set to always, every write operation calls a Fsync, and the data is the safest, and of course, the performance is affected because Fsync is executed every time.

What's the difference for pipelining?

For pipelining operations, the process is that the client sends n commands at a time, and then waits for the return result of the N commands to be returned together. The adoption of pipilining means that the return value of each command is discarded. Because in this case, n commands are executed during the same execution. So when setting Appendfsync to Everysec, there may be some deviations because the N commands can take longer than 1 seconds or even 2 seconds. However, it can be guaranteed that the maximum time will not exceed the execution time of the N commands.

Comparison with PostgreSQL and MySQL

This piece is not much to say, because the above operating system level of data security has been said a lot, so in fact, different databases in the implementation of the same. In short, the final conclusion is that, in the case of Redis open aof, its stand-alone data security is not weaker than these mature SQL databases.

What is the use of these persisted data, of course, for data recovery after a reboot. Redis is an in-memory database, either an RDB or a aof, that is just a measure of its data recovery. So Redis will read the RDB or the aof file and reload it into memory when it recovers with the RDB and aof. Compared to MySQL and other database startup time, the president is a lot, because MySQL would not need to load the data into memory.

However, in contrast, when MySQL is started to provide services, the hot data it accesses is also slowly loaded into memory, which is often called preheating, and its performance is not too high until the preheating is complete. The advantage of Redis is that data is loaded into memory at once and warmed up at once. This allows the service to be delivered very quickly as long as Redis is up and running.

There are some differences in the start-up time of using an RDB and using AOF. The RDB has a shorter start-up time for two reasons, one for each data in the Rdb file, and no more than one record of the data that may be logged as the AOF log. So every piece of data just needs to be written once. Another reason is that the format of the Rdb file is consistent with the encoding format of the Redis data in memory and does not require any further data coding. The CPU consumption is much smaller than the load of the AOF log.

Well, that's probably what it says here. For a more complete version, see the Redis author's blog post: Redis persistence demystified. In this article, if there is a description of the shortcomings, we correct.


Decrypt Redis Persistence

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.