Redis persistence, master-slave and Data Backup (2), redis master-slave data backup

Source: Internet
Author: User
Tags disk usage

Redis persistence, master-slave and Data Backup (2), redis master-slave data backup
Redis has been widely used in the project. To improve redis performance and reliability, we need to know and do the following:
Common memory optimization methods and parameters

Redis's performance depends entirely on memory, so we need to know how to control and save memory.

First, do not enable the VM option of Redis, that is, the virtual memory function, this was originally a persistent policy for Redis to replace the memory and disk for storing data that exceeds the physical memory. However, the memory management cost is very high, so we need to disable the VM function, please check your redis. in the conf file, vm-enabled is no.

Next, we 'd better set up redis. maxmemory option in conf, which indicates that Redis starts to reject subsequent write requests after using the amount of physical memory, this parameter can effectively protect your Redis against swap caused by excessive physical memory usage, which seriously affects performance and even crashes.

In addition, Redis provides a set of parameters for different data types to control memory usage. We know that Redis Hash is a HashMap inside the value. If the number of members of this Map is small, A compact format similar to one-dimensional linear format is used to store the Map, which saves the memory overhead of a large number of pointers. this parameter is controlled in redis. the conf configuration file contains the following two items:

Hash-max-zipmap-entries 64
Hash-max-zipmap-value 512

It means that when the value Map contains no more than a few members, it will be stored in a linear compact format. The default value is 64. That is, if the value contains less than 64 members, it will use linear compact storage, if this value is exceeded, it is automatically converted to a real HashMap.

Hash-max-zipmap-value indicates that when the length of each member value in the Map is no more than a few bytes, a linear compact storage is used to save space.

If any of the above two conditions exceeds the set value, it will be converted into a real HashMap, which will no longer save memory. Is this value a greater value, the better? Of course, the answer is no, the advantage of HashMap is that the time complexity of search and operation is O (1), while the time complexity of O (n) is given up when Hash is used for one-dimensional storage. If the number of members is small, otherwise, the performance will be seriously affected. Therefore, we need to weigh the setting of this value, which is the most fundamental balance between the time cost and the space cost.

Similar parameters include:

List-max-ziplist-entries 512

Note: The number of nodes in the list data type follows the compact storage format of pointer removal.

List-max-ziplist-value 64

Note: The number of bytes smaller than the node value of the list data type adopts the compact storage format.

Set-max-intset-entries 512

NOTE: If all the internal data of the set data type is of the numeric type, and the following nodes are stored in a compact format.

The internal implementation of Redis does not optimize the memory allocation too much. To a certain extent, there will be memory fragments, but in most cases this will not become the performance bottleneck of Redis, however, if most of the data stored in Redis is numeric, Redis uses a shared integer internally to save the memory allocation overhead, that is, when the system starts, it first allocates ~ N so many numeric objects are placed in a pool. If the stored data happens to be data within the value range, the object is taken directly from the pool, in addition, the system can share the data by referencing the count, which saves memory and improves performance to a certain extent when a large number of values are stored in the system, the setting of this parameter value n needs to modify a macro in the source code to define REDIS_SHARED_INTEGERS. The default value is 10000. You can modify the value according to your own needs, and then re-compile it.

Persistence

Redis is a memory database that supports persistence. That is to say, redis often needs to synchronize data in the memory to the disk to ensure persistence. Redis supports two persistence Methods: Snapshotting (snapshot) and Append-only file (aof.

Snapshotting

Snapshots are the default persistence method. In this way, data in the memory is written to the binary file as a snapshot. The default file name is dump. rdb. You can configure and set Automatic snapshot persistence. We can configure redis to automatically create snapshots if more than m keys are modified in n seconds. The following is the default snapshot storage Configuration:

[Plain]View plaincopy
  1. Save 900 1 # if more than one key is modified within 900 seconds, the snapshot is saved.
  2. Save 300 10 #300 seconds if more than 10 keys are modified, the snapshot is saved.
  3. Save 60 10000 # If more than 10000 keys are modified in 60 seconds, the snapshot is saved.
You can also run the following command to enable redis to perform snapshotting:
[Plain]View plaincopy
  1. Redis-cli-h ip-p port bgsave
The save and bgsave commands are used to save snapshots. The save operation saves snapshots in the main thread. Because redis uses a main thread to process all client requests, this method will block all client requests, so it is not recommended.

The Snapshot generation process is roughly as follows:

  1. Redis calls fork and now has sub-process and parent process;
  2. The parent process continues to process client requests. The child process is responsible for writing memory content to temporary files. Because the copy on write mechanism of the OS Parent and Child processes share the same physical page, when the parent process processes write requests, the OS creates a copy of the page to be modified by the parent process, instead of writing shared pages. Therefore, the data in the address space of the sub-process is a snapshot of the entire database at the fork moment;
  3. After the sub-process writes the snapshot to the temporary file, it replaces the original snapshot file with the temporary file, and then the sub-process exits.

At the same time, snapshotting is insufficient because there is a time interval between two snapshot operations. Once a database problem occurs, the data saved in the snapshot file is not brand new, data from the last snapshot file generation to Redis downtime is all lost. If your business requires extremely high data accuracy, You have to adopt the aof persistence mechanism.

Aof

Aof is more persistent than snapshot, because when aof persistence is used, redis will append every write command received to the file through the write function (appendonly by default. aof ). When redis is restarted, it re-executes the write commands saved in the file to recreate the entire database content in the memory. Of course, because the OS caches write modifications in the kernel, it may not be immediately written to the disk. In this way, the persistence of the aof method may also lose some modifications. However, we can use the configuration file to tell redis when we want to force the OS to be written to the disk through the fsync function. There are three methods (default: fsync once per second ):

[Plain]View plaincopy
  1. Appendonly yes // enable the aof persistence Method
  2. # Appendfsync always // immediately write data to the disk every time you receive the write command, which is the slowest, but ensures full persistence and is not recommended.
  3. Appendfsync everysec // forcibly writes data to the disk once per second, which makes a good compromise between performance and persistence. We recommend that you
  4. # Appendfsync no // fully dependent on OS, which has the best performance and is not guaranteed for persistence

The aof method also brings about another problem. Persistent files become larger and larger. For example, if we call the incr test command 100 times, all the 100 commands must be saved in the file. In fact, 99 of them are redundant. Because it is enough to restore the database status to save a set test 100 in the file. To Compress aof persistent files. Redis provides the bgrewriteaof command. After receiving this command, redis will save the data in memory to a temporary file in a similar way as a snapshot, and finally replace the original file. The bgrewriteaof command is as follows:

[Plain]View plaincopy
  1. Redis-cli-h ip-p port bgrewriteaof
The bgrewriteaof command execution process is as follows:
  1. Redis calls fork and now has two processes: parent and child;
  2. The sub-process writes the command to the temporary file to rebuild the database status based on the database snapshot in the memory;
  3. The parent process continues to process client requests, except for writing commands to the original aof file. Cache the received write commands. In this way, we can ensure that if the sub-process rewrite fails, there will be no problems;
  4. When a child process writes the snapshot content to a temporary file as a command, the child process sends a signal to notify the parent process. Then the parent process writes the cached write command to the temporary file;
  5. Now, the parent process can replace the old aof file with a temporary file and rename it. The subsequent write commands also start to append to the new aof file.

These two persistence methods have their own characteristics, and the snapshot has little impact on the relative performance. However, when a crash occurs, the data volume is greatly lost, while the aof data security is high, but the performance is greatly affected, this requires you to choose based on your business characteristics.

Master-slave Replication

Redis's master-slave replication policy is implemented through its persistent rdb file. The process is to dump the rdb file and transmit the full rdb file to slave, then, synchronize the dump operations to the slave in real time.

To use the Master/slave function, you must perform simple configuration on the slave side:
[Plain]View plaincopy
  1. Slaveof master_ip master_port # If this machine is a redis slave instance, you can enable this setting.
  2. Slave-serve-stale-data no # If slave cannot be synchronized with the master, it is set to slave unreadable, so that the monitoring script can detect problems.

After the configuration is complete, start the slave end to perform master-slave replication. The master-slave replication process is roughly as follows:

  1. The Slave end adds the slaveof command to the configuration file, so the Slave reads the configuration file at startup and the initial status is REDIS_REPL_CONNECT;
  2. The Slave end connects to the Master in the scheduled task serverCron (Redis Internal timer trigger event) and sends the sync command, then block wait for the master to send back its memory snapshot file (the latest version of Redis does not need to block the Slave );
  3. When the Master side receives the sync command, it simply determines whether there is a memory snapshot sub-process in progress. If no, it starts the memory snapshot immediately. If yes, It waits until it ends, after the snapshot is complete, the file will be sent to the Slave end;
  4. The server Load balancer receives the memory snapshot file sent from the Master and stores it locally. After receiving the snapshot file, it clears the memory table, reads the memory snapshot file sent from the Master again, and recreates the data structure of the entire memory table, the final state is set to REDIS_REPL_CONNECTED, And the Slave state machine flow is complete;
  5. When the Master sends a snapshot file, any commands that change the dataset will be saved to the sending cache Queue (list Data Structure) connected to the Slave network for the moment. After the snapshot is complete, it is sent to Slave in sequence, and the commands received are processed in the same way, and the status is set to REDIS_REPL_ONLINE.
The entire replication process is completed, as shown in the following figure:

From the above replication process, we can find that when the Slave database is connected to the Master database of the Master database, the Master will take a memory snapshot and then send the entire snapshot file to the Slave, that is, there is no replication location like MySQL, that is, no incremental replication. If one master node is connected to multiple slave instances, the master performance will be affected.

Data backup policy

The specific backup policy can be flexible. For example, it can be roughly as follows:

  1. To improve the performance of the master node, disable the persistence mechanism of the master node, that is, using the bgsave command to take snapshots at a scheduled time at a low traffic volume in the early morning, save the snapshot file to the backup server;
  2. The server Load balancer enables the aof mechanism and regularly uses bgrewriteaof for data compression to save the compressed data files to the backup server;
  3. Regularly check whether the data on the master and slave is consistent;
  4. When the master node fails and needs to be recovered, if the backup snapshot of the master node is used for restoration, the backup dump will be directly performed. copy the rdb to the corresponding path and restart it. If you want to recover from the slave, you need to execute a snapshot on the slave side, copy the snapshot file to the master path, and then restart it. However, when the master instance is restarted, the slave data is washed out. Therefore, the slave instance must be backed up before the master instance is restarted.
Persistent disk I/O mode and Problems

People with online Redis O & M experience will find that Redis uses a lot of physical memory, but it is unstable or even crashes if it does not exceed the actual total physical memory capacity, some people think that the snapshot-based persistent fork system call results in a doubling of memory usage, which is inaccurate, because the copy-on-write mechanism called by fork is based on the operating system page, that is, only dirty pages that have been written will be copied, however, the general system won't write all pages in a short period of time, causing replication. What causes Redis to crash?

The answer is that Redis uses Buffer IO for persistence. The so-called Buffer IO means that Redis will use the physical memory Page Cache for writing and reading persistent files, most database systems use Direct IO to bypass this Page Cache and maintain a data Cache on their own. When the persistent file in Redis is too large (especially the snapshot file ), during read/write operations, the data in the disk files will be loaded into the physical memory as a Cache for the file by the operating system, the data in this layer of Cache is stored repeatedly with the data managed in Redis memory. Although the kernel will remove the Page Cache when the physical memory is insufficient, but the kernel may think that a Page Cache is more important, and let your process start Swap, then your system will begin to become unstable or crash. Experience is that when your Redis physical memory usage exceeds 3/5 of the total memory capacity, it will start to be more dangerous.



1. the snapshot method is persistent to the disk.Automatic persistence Rule Configuration save 900 1 save 300 10 save 60 10000 the above configuration rules mean as follows: # In the example below the behaviour will be to save: # after 900 sec (15 min) if at least 1 key changed # after 300 sec (5 min) if at least 10 keys changed # after 60 sec if at least 10000 keys changedredis can also disable automatic persistence, comment out these save configurations, or save "". If an error occurs when the background is saved to the disk, the write operation will be stopped. stop-writes-on-bgsave-error yes use LZF to compress rdb files, which consumes CPU but reduces disk usage. rdbcompression yes save rdb and Load Rdb file check can prevent errors, but it requires about 10% of performance. You can disable it to improve performance. Rdbchecksum yes the exported rdb file name dbfilename dump. rdb sets the working directory. rdb files are written to this directory, and append only files are stored in this directory. dir. /Redis automatic snapshots are saved to the disk or bgsave is called. The background process is completed. Other clients can still read and write redis servers. Saving snapshots to the disk in the background will occupy a lot of memory. Calling save to save data in the memory to the disk will block client requests until the storage is complete. When the shutdown command is called, The Redis server will first call save. After all data is persisted to the disk, it will actually exit. For data loss problems:
If the server crashes, all data after the last snapshot will be lost. Therefore, when setting the Save rule, you must set the allowed range according to the actual business. If data-sensitive services are used, use appropriate logs in the program to restore data after the server crash. 2. Append-only file persistence  The other method is incremental, which will cause data changes and be persisted to the file. When redis is restarted, the data will be restored through the Operation Command. after each write operation command is executed, data is written to the server. aofbuf. # Appendfsync alwaysappendfsync everysec # appendfsync no when configured as always, each time the server. the data in aofbuf is returned to the client only after it is written to a file. This ensures that no data is lost, but frequent IO operations will reduce performance. Everysec writes data once per second, which may result in loss of operations within one second. The biggest problem with aof is that the append file will become very large over time, so we need the bgrewriteaof command to reorganize the file and only keep the latest kv data.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.