Redis persistence and redis persistence

Last Update:2016-01-27 Source: Internet

Author: User

Tags savepoint

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Redis persistence and redis persistence

Redis has two persistence Methods: Snapshot (RDBFiles) and append files (AOFFile)

RDB persistence is used to save a data snapshot at a specific interval.
The AOF (Append only file) Persistence method records the write operations received by each server. When the data is replied, the operations of these records are executed one by one to recreate the original data. The format of the write operation Command record is the same as that of the Redis Protocol and is saved as an append.

Redis's persistence can be disabled, and two ways of persistence can exist at the same time, but when Redis restarts, the AOF file will be used to rebuild the data first.

1. RDB
RDB is Snapshot storage, which is the default persistence method. Save data to disk periodically according to certain strategy. The corresponding data file is dump.rdb, and the snapshot period is defined by the save parameter in the configuration file. Redis supports saving snapshots of current data into a data file for persistence. And how does a continuously written database generate a snapshot? Redis uses the copy on write mechanism of the fork command. When generating a snapshot, fork the current process into a child process, and then loop all the data in the child process to write the data into an RDB file.

Client can also use save or bgsave command to notify redis to do a snapshot persistence. The save operation saves the snapshot in the main thread. Since redis uses a main thread to process all client requests, this method will block all client requests, so it is not recommended. Another point to note is that each time the snapshot is persisted, the memory data is written to the disk once, not incrementally, only dirty data is synchronized. If the amount of data is large, and there are many write operations, it will inevitably cause a large number of disk io operations, which may seriously affect performance.

Redis' RDB file will not be broken because its write operation is performed in a new process. When a new RDB file is generated, the child process generated by Redis will first write the data to a temporary file, and then rename the temporary file to an RDB file through the atomic rename system call. In this way, whenever there is a failure, Redis RDB files are always available. And the Redis RDB file is also a part of the internal implementation of Redis master-slave synchronization

Master-slave synchronization
The first implementation of Slave to Master synchronization is:

Slave sends a synchronization request to the Master. The Master dumps the RDB file first, and then transfers the entire RDB file to the slave. Then the Master forwards the cached command to the Slave, and the initial synchronization is completed.

The second and subsequent synchronization implementations are:

The master sends the snapshots of the variables directly to each slave in real time. But no matter what causes Slave and Master to disconnect and reconnect, the above two steps will be repeated.

Redis master-slave replication is based on the persistence of memory snapshots. As long as there is a slave, there will be memory snapshots.

working principle
Redis calls fork () to spawn a child process.

The parent process continues to process client requests, and the child process writes the memory data to a temporary RDB file. Because os's copy-on-write mechanism (copy on write), the parent and child processes will share the same physical page. When the parent process processes the write request, os will create a copy of the page to be modified by the parent process, instead of writing the shared page. So the data in the address space of the child process is a snapshot of the entire database at the moment of fork.

When the child process finishes writing the snapshot to the temporary file, replace the original snapshot file with the temporary file, and then the child process exits

advantage
The RDB file is a very simple single file, which saves Redis data at a certain point in time, and is very suitable for backup. You can set a point in time to archive RDB files, so that you can easily restore the data to different versions when needed.

RDB is very suitable for disaster recovery. A single file can be easily transferred to a remote server.

The performance of RDB is very good. When persistence is required, the main process will fork a child process and then hand over the persistence work to the child process. It will not have related I / O operations.

Compared to AOF, in the case of a relatively large amount of data, RDB starts faster.

Disadvantages
RDB can easily cause data loss. Assuming that the snapshot is saved every 5 minutes, if Redis is not working properly for some reason, then the data from the last time the snapshot was taken to when Redis is in trouble will be lost.

RDB uses fork () to generate child processes for data persistence. If the data is relatively large, it may take some time, causing Redis to stop serving for a few milliseconds. If the amount of data is large and the CPU performance is not very good, the time to stop the service will even reach 1 second.

File path and name
By default Redis will store the snapshot file as a file named dump.rdb in the current directory. To modify the storage path and name of the file, you can modify the configuration file redis.conf to achieve:

# RDB file name, default is dump.rdb.
dbfilename dump.rdb

# The directory where files are stored. AOF files are also stored in this directory. The default is the current working directory.
dir ./
Savepoint (enable and disable RDB)
You can configure the savepoint so that Redis saves the snapshot file if the data changes M times every N seconds. For example, the following savepoint configuration means that every 60 seconds, if the data changes more than 1000 times, Redis will automatically save the snapshot file:

save 60 1000
You can set multiple save points, and the Redis configuration file sets 3 save points by default:

# The format is: save <seconds> <changes>
# Can set multiple.
save 900 1 #At least 1 key has changed after 900 seconds
save 300 10 #At least 10 keys have changed after 300 seconds
save 60 10000 #At least 10000 keys have changed after 60 seconds
If you want to disable the snapshot saving function, you can achieve this by commenting out all "save" configurations, or add the following configuration after the last "save" configuration:

save ""
Error handling
By default, if Redis fails to generate a snapshot in the background, it will stop receiving data in order to let users know that the data has not been persisted successfully. But if you have other ways to monitor the status of Redis and its persistence, you can disable this feature.

stop-writes-on-bgsave-error yes
data compression
By default, Redis will use LZF to compress data. If you want to save some CPU performance, you can disable the compression function, but the data set will be played more than when it is not compressed.

rdbcompression yes
Data validation
From the beginning of version 5 RDB, a CRC64 check code will be placed at the end of the file. This can ensure the integrity of the file, but it will lose some performance (about 10%) when saving or loading the file. If you want to pursue higher performance, you can disable it, so that the file will be replaced with 0 when writing the check code, and when you see 0 when loading, it will skip the check directly.

rdbchecksum yes
Generate a snapshot manually
Redis provides two commands for manually generating snapshots.

SAVE
The SAVE command will generate RDB snapshot files in a synchronized manner, which means that all other client requests will be blocked during this process. Therefore, it is not recommended to use this command in the production environment, unless for some reason you need to prevent Redis from using the child process to generate snapshots in the background (for example, an error occurs when calling fork (2)).

BGSAVE
The BGSAVE command uses the background method to save the RDB file. After calling this command, the OK return code will be returned immediately. Redis will spawn a child process to process and immediately restore service to the client. On the client side, we can use the LASTSAVE command to check whether the operation is successful.

127.0.0.1:6379> BGSAVE
Background saving started
127.0.0.1:6379> LASTSAVE
(integer) 1433936394
Disabling the snapshot generation function in the configuration file does not affect the effect of the SAVE and BGSAVE commands.

2. AOF
Snapshots are not very reliable. If the server crashes suddenly, the latest data will be lost. The AOF file provides a more reliable way of persistence. Whenever Redis receives a command that will modify the data set, it will append the command to the AOF file. When you restart Redis, the command in AOF will be executed again to reconstruct the data

principle
redis calls fork, now there are two processes of father and son

The child process writes a command to rebuild the database state to the temporary file based on the database snapshot in memory

The parent process continues to process client requests, except that write commands are written to the original aof file. At the same time, the received write commands are cached. This will ensure that if the child process rewrite fails, there will be no problems

When the child process writes the contents of the snapshot to the temporary file in a commanded manner, the child process signals the parent process. Then the parent process also writes the cached write command to the temporary file

Now the parent process can replace the old AOF file with a temporary file and rename it. The write commands received later will also be added to the new AOF file.

advantage
More reliable than RDB. You can develop different fsync strategies: no fsync, fsync once per second, and fsync for every query. The default is fsync once per second. This means that you lose up to one second of data.

The AOF log file is a purely appended file. Even if the server crashes suddenly, there will be no problem of log location or damage. Even if for some reason (for example, the disk is full) the command is only half written to the log file, we can also use the redis-check-aof tool to repair it very easily.

When the AOF file is too large, Redis will automatically rewrite it in the background. Rewriting is safe, because rewriting is performed on a new file, and Redis will continue to append data to the old file. The new file will be written with a set of minimal operation commands that can reconstruct the current data set. When the new file is overwritten, Redis will switch the old and new files, and then start writing data to the new file.

AOF saves the operation commands one by one in a file in a simple and easy-to-understand format, which is easy to export for data recovery. For example, we accidentally erased all data with the FLUSHALL command. As long as the file has not been rewritten, we can stop the service, delete the last command, and then restart the service, so that the erased data can be restored come back.

Disadvantages
In the same data set, the size of the AOF file is generally larger than the RDB file.

Under certain fsync strategies, AOF will be slower than RDB. Usually fsync is set to once per second to get relatively high performance, and the speed can reach the RDB level when fsync is disabled.

In the past, it was found that some very rare bugs caused the data reconstructed by AOF to be inconsistent with the original data.

Enable AOF
Set the configuration item appendonly to yes:

appendonly yes
File path and name
# File storage directory, shared with RDB. The default is the current working directory.
dir ./

# The default file name is appendonly.aof
appendfilename "appendonly.aof"
reliability
You can configure how often Redis calls fsync. There are three options:

Call fsync whenever a new command is appended to AOF. The slowest speed, but the safest.

Fsync once per second. Fast speed (2.4 version is similar to the snapshot mode), and the security is good (at most 1 second of data is lost).

Never fsync, let the system handle it. This method is the fastest, but security is not guaranteed

It is recommended to use fsync once per second (the default method) because it is fast and has good security. The relevant configuration is as follows:

# appendfsync always
appendfsync everysec
# appendfsync no
Log rewrite
As write operations continue to increase, AOF files will grow larger and larger. For example, if you increment a counter 100 times, the final result is that the counter value in the data set is the final increment result, but the 100 operations will be completely recorded in the AOF file. In fact, to restore this record, only one command is needed, that is to say, the 100 commands in the AOF file can actually be reduced to one. So Redis supports such a function: rebuild AOF files in the background without interrupting service.

The working principle is as follows:

Redis calls fork () to spawn a child process.

The child process writes the new AOF to a temporary file.

The main process continues to write new changes to the buffer in memory, and also writes these new changes to the old AOF, so that even if the rewrite fails, the data security can be guaranteed.

When the child process finishes rewriting the file, the main process gets a signal and then appends the buffer in memory to the new AOF generated by the child process.

We can set the conditions for log rewriting through configuration:

#When rewriting the log, do not perform command append operation, but just put it in the buffer to avoid conflict with DISK IO caused by command append.
#Set to yes means no fsync for new write operations during rewrite, temporarily stored in memory, and write after rewrite is completed, the default is no, yes is recommended
no-appendfsync-on-rewrite yes

# Redis will remember the size of the AOF file since the last rewrite (if Redis has not been rewritten since startup, remember the size of the AOF file used at startup).
# If the current file size exceeds the specified percentage than the remembered size, rewriting will be triggered.
# At the same time, you need to set a minimum file size. Only files larger than this value will be rewritten, in case the file is small, but it has reached a percentage.

auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
To disable automatic log rewriting, we can put the percentage Set to 0:

auto-aof-rewrite-percentage 0
Redis 2.4 and above can automatically perform log rewriting. The previous version required manual operation of BGREWRITEAOF.

Data corruption repair
If for some reason (for example, the server crashes) the AOF file is damaged and Redis cannot be loaded, you can repair it in the following ways:

Back up the AOF file.

Use the redis-check-aof command to repair the original AOF file:

$ redis-check-aof --fix
You can use the diff -u command to see the difference between the two files.

Use the repaired file to restart the Redis service.

Switch from RDB to AOF
Here only the Redis> = 2.2 version:

Back up the latest dump.rdb file and put the backup file in a safe place.

Run the following two commands:

$ redis-cli config set appendonly yes
$ redis-cli config set save ""
Make sure the data is the same as before the switch.

Make sure the data is written correctly in the AOF file.

The second command is used to disable the RDB persistence method, but this is not necessary, because you can enable both persistence methods.

Remember to edit the configuration file redis.conf to enable AOF, because the command line to modify the configuration will be invalid after restarting Redis.

As can be seen from the above, both RDB and AOF operations are sequential IO operations with high performance. At the same time, when the database is restored through the RDB file or the AOF log, the data is read sequentially and loaded into the memory. So it will not cause random read of the disk.

What exactly do you choose? Here are suggestions from the official:

Generally, if you want to provide a high degree of data security, then it is recommended that you use both persistence methods. If you can accept a few minutes of data loss caused by a disaster, then you can just use RDB. Many users only use AOF, but we recommend that since RDB can take a complete snapshot of the data from time to time and provide a faster restart, it is best to also use RDB.

In terms of data recovery: RDB startup time will be shorter for two reasons

One is that there is only one record for each piece of data in the RDB file, and there will not be multiple operation records for one piece of data like the AOF log. So each data only needs to be written once.

Another reason is that the storage format of the RDB file and the encoding format of the Redis data in memory are the same, and there is no need to perform data encoding work, so the CPU consumption is much smaller than the loading of the AOF log.

note:
The above mentioned the persistence of RDB snapshots. You need to pay attention: when taking a snapshot (save), the child process that forks out to perform the dump operation will occupy the same memory as the parent process. And the memory consumption is relatively large. For example, the machine has 8G memory, Redis has used 6G memory, and then save will generate 6G again, which becomes 12G, which is larger than the system's 8G. At this time, swapping will occur; if there is not enough virtual memory, it will crash and cause data loss. Therefore, when using redis, be sure to plan the capacity of the system memory.

At present, the usual design idea is to use the Replication mechanism to make up for the deficiencies in AOF and snapshot performance, and achieve data durability. That is, Snapshot and AOF are not done on the Master to ensure the read and write performance of the Master, while the Snapshot and AOF are simultaneously turned on on the Slave for persistence to ensure data security.

Third, the test of Redis persistence
Have a certain understanding of snapshot and aof through the above theory, let's start some tests

1. Redis.conf opens snapshot and closes AOF
save 900 1
save 300 10
save 60 10000

rdbcompression no
rdbchecksum no
dbfilename redis.rdb
dir / home / backup / redis

appendonly no
test
[root @ localhost redis] # ./src/redis-cli
127.0.0.1:6379> keys *
1) "a"
127.0.0.1:6379> set b 2
OK
127.0.0.1:6379> set c 3
OK
127.0.0.1:6379> set d 4
OK
127.0.0.1:6379> keys *
1) "c"
2) "a"
3) "aa"
4) "b"
5) "d"
127.0.0.1:6379> save
OK
#Save, for persistence, every time save is executed, a record will be recorded in the date: "* DB saved on disk"

127.0.0.1:6379> lpush aa 1
(integer) 1
127.0.0.1:6379> lpush aa 2
(integer) 2
Persistence verification, restart redis

127.0.0.1:6379> keys *
1) "c"
2) "a"
3) "aa"
4) "b"
5) "d"
The lpush operation is after save, but still has this data after restart

What is the reason, we can check the log

6720: signal-handler (1453738444) Received SIGTERM scheduling shutdown ...
6720: M 26 Jan 00: 14: 04.896 # User requested shutdown ...
6720: M 26 Jan 00: 14: 04.896 * Saving the final RDB snapshot before exiting.
6720: M 26 Jan 00: 14: 04.932 * DB saved on disk
6720: M 26 Jan 00: 14: 04.932 * Removing the pid file.
6720: M 26 Jan 00: 14: 04.932 # Redis is now ready to exit, bye bye ...
As you can see from the log, close redis normally and execute the save command before closing. The effect of using kill is the same as above, it is normally closed

What about abnormal shutdown? When sending a signal in the form of kill -9

127.0.0.1:6379> set ss 1
Could not connect to Redis at 127.0.0.1:6379: Connection refused
not connected> get ss
Could not connect to Redis at 127.0.0.1:6379: Connection refused
not connected> get ss
(nil)
Pass the test, enable RDB persistence, the data will be persisted when the save condition is met, manual save, and normal shutdown; and the data will be lost when the abnormal shutdown terminates

2. Redis.conf closes snapshot, closes aof
#save 900 1
#save 300 10
#save 60 10000
rdbcompression no
rdbchecksum no
dbfilename redis.rdb
dir ./

appendonly no
operating

redis 127.0.0.1:6379> keys *
(empty list or set)
redis 127.0.0.1:6379> set name test
OK
redis 127.0.0.1:6379> save
OK
redis 127.0.0.1:6379> set aa 1
OK
redis 127.0.0.1:6379>

#Restart redis

redis 127.0.0.1:6379> keys * #found that the key that was not saved just now is lost
1) "name"
From the above results, we can see that when persistence is turned off, the data will be persisted only when it is manually saved, and the data is lost when it is normally closed. If there is no manual save from the beginning to the close of writing data, all data will be lost. Since manual save can indirectly indicate that the snapshot always exists, it cannot be said to prohibit snapshot, it should be to prohibit automatic snapshot function.

3. Redis.conf closes the snapshot and opens aof
appendonly yes
appendfilename redis.aof
# appendfsync always
appendfsync everysec
# appendfsync no

no-appendfsync-on-rewrite no
auto-aof-rewrite-min-size 64mb
operating

redis 127.0.0.1:6379> keys *
1) "name"

#Modify the AOF parameter and restart the database:

redis 127.0.0.1:6379> keys *
(empty list or set)
redis 127.0.0.1:6379>

#No records in the database
#View logs:
# * DB loaded from append only file: 0.000 seconds
#It is found that the data is synchronized from the 0 byte AOF file, why not synchronize the data of RDB? It turns out that the priority is written in the redis code, AOF> RDB
View source code redis.c grep 'DB loaded from' ./ -R

void loadDataFromDisk (void) {
long long start = ustime ();
if (server.aof_state == REDIS_AOF_ON) {
if (loadAppendOnlyFile (server.aof_filename) == REDIS_OK)
redisLog (REDIS_NOTICE, "DB loaded from append only file:% .3f seconds", (float) (ustime ()-start) / 1000000);
} else {
if (rdbLoad (server.rdb_filename) == REDIS_OK) {
redisLog (REDIS_NOTICE, "DB loaded from disk:% .3f seconds",
(float) (ustime ()-start) / 1000000);
} else if (errno! = ENOENT) {
redisLog (REDIS_WARNING, "Fatal error loading the DB:% s. Exiting.", strerror (errno));
exit (1);
}
}
}
It should be noted here: when AOF is turned on in the middle, and the restart takes effect, it cannot be restarted normally for the second time.

Because when the first restart makes the AOF take effect, the redis has been read to start the file, which causes the redis data at this time to be empty (priority). The second restart will save this empty data to the RDB file, which will cause the original RDB data to be replaced and result in data loss. So be careful, in order to avoid the tragedy, it is best to backup RDB files when you want to restart redis.

redis 127.0.0.1:6379> keys *
(empty list or set)
redis 127.0.0.1:6379> set name tt
OK
redis 127.0.0.1:6379> save
OK

#Turn on the aof parameter
#First restart
redis 127.0.0.1:6379> keys * #Priority reason as mentioned above: aof> rdb, the result is empty
(empty list or set)

#The second normal restart, save the empty result to the RDB, and the data is lost. At this time, the db is empty, the log record "* DB saved on disk"
redis 127.0.0.1:6379> keys *
(empty list or set)

#Data has been initialized and data is lost
Here There is a problem. For example, when using redis, only the RDB persistent method is turned on at the beginning. AOF is not turned on. After a period of time, I want to turn on AOF. How do I write the RDB data directly to the AOF file? There are 2 methods

a. Before starting AOF, execute bgrewriteaof first, and then restart

redis 127.0.0.1:6379> keys * #Check if there is data
(empty list or set)
redis 127.0.0.1:6379> set name ttd
OK
redis 127.0.0.1:6379> keys *
1) "name"

redis 127.0.0.1:6379> bgsave #save data
Background saving started
redis 127.0.0.1:6379> keys *
1) "name"

#Only one RDB file, no AOF file

redis 127.0.0.1:6379> bgrewriteaof #Execute merge rewrite function to generate AOF file
Background append only file rewriting started

#At this time, open the aof parameter (appendonly yes) in the redis.conf file and restart to take effect.
#The log appears: * DB loaded from append only file: 0.000 seconds

redis 127.0.0.1:6379> keys * #data still
1) "name"

#check the file
[root @ localhost data] # od -c redis.aof
0000000 * 2 \ r \ n $ 6 \ r \ n S E L E C T \ r \ n
0000020 $ 1 \ r \ n 0 \ r \ n * 3 \ r \ n $ 3 \ r \ n S
0000040 E T \ r \ n $ 4 \ r \ n n a m e \ r \ n $ 4
0000060 \ r \ n j a c k \ r \ n
0000070
b. Use the CONFIG GET / SET method to dynamically modify the configuration file

redis 127.0.0.1:6379> BGSAVE
Background saving started
#At this time, only the rdb file

#Dynamic modification of parameters, turn on the AOF function: appendonly yes
redis 127.0.0.1:6379> CONFIG SET appendonly yes #dynamically modify parameters
OK
redis 127.0.0.1:6379> CONFIG GET append *
1) "appendonly"
2) "yes"
3) "appendfsync"
4) "everysec"
redis 127.0.0.1:6379>

#aof file has been generated and has data (synchronized rdb)

#Information in the log: * Background append only file rewriting started by pid 3165
#Because the parameters are dynamically modified, they will be invalid after restarting, so you can modify the parameters of the redis.conf file during maintenance
As can be seen from the above results, when redis restarts to load data, the AOF file must be read before the RDB file, so try to enable the AOF option at the beginning, and do not open it in the middle.

Through the log, you can clearly know which file Redis uses to retrieve data:

RDB: * DB loaded from disk: 0.000 seconds
AOF: * DB loaded from append only file: 0.000 seconds
Save the data

RDB: * DB saved on disk
AOF: * Calling fsync () on the AOF file
4. Redis.conf open snapshot, open aof
save 900 1
save 300 10
save 60 10000

appendonly yes
appendfilename zhoujy.aof
# appendfsync always
appendfsync everysec
# appendfsync no

no-appendfsync-on-rewrite no
auto-aof-rewrite-min-size 64mb
Through the above tests, the operation methods of RDB and AOF have been explained, and such as loading at restart, the data will be restored to memory according to the following priority at restart:

If only AOF is configured, load the AOF file to restore data when restarting

If RBD and AOF are configured at the same time, only AOF files are loaded to recover data at startup

If only RDB is configured, load dump file to restore data at startup

4. Redis data backup

The backup is very simple, just copy the RDB and AOF files and back up.

#redisA: Generate test data on A
redis 127.0.0.1:6379> set name test
7.0.0.1:6379> set age 17
OK
redis 127.0.0.1:6379> keys *
1) "age"

redis 127.0.0.1:6379> bgsave
Background saving started

#redisB: No data on B
redis 127.0.0.1:6380> keys *
(empty list or set)

#Copy A's files to B (rdb and aof files)
cp redis / * redis2 /
#Modify permissions
chown -R redis.redis *
#Restart B to restore
redis 127.0.0.1:6380> keys *
1) "sex"

reference

http://redis.io/topics/persistence
http://www.cnblogs.com/zhoujinyi/archive/2013/05/26/3098508.html
http://heylinux.com/archives/1932.html
http://database.51cto.com/art/201203/322144.htm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More