Background
Our company is using Inotify+rsync to do real-time synchronization to solve the problem of distributed cluster file consistency. But when the Web files more and more (millions number of small files, such as html,jpg), synchronization is more and more slow, simply do not real-time, according to the online tuning methods have tried, the problem is not resolved. After my careful research, and finally the slow core of the study to understand, first summed up a inotifywait response will not have delays, rsync is also very fast. Everybody also has the slow trouble, that is because the online Inotify+rsync tutorial is the pit. Here's what we'll analyze.
inotifywait Separate analysis
/usr/local/bin/inotifywait-mrq--format '%xe%w%f '-e modify,create,delete,attrib/data/
Executes the above command, is lets the inotifywait listens to the/data/directory, when hears has the occurrence modify,create,delete,attrib and so on event occurrence, presses the%XE%w%f the format output.
Touch several files in the/data/directory
TOUCH/DATA/{1..5}
Watch inotify Output
ATTRIB/DATA/1--Indicates that the ATTRIB event path occurred is/DATA/1
Attrib/data/2
Attrib/data/3
Attrib/data/4
Attrib/data/5
Know that the above output effect we should want to use rsync to obtain inotifywait monitored file list to do the specified file synchronization, rather than every time by rsync to do a full directory scan to determine whether the file is different.
Inotify+rsync analysis on the internet
Let's take a look at the online tutorials and I added a note. (All the tutorials on the internet are basically the same, even though the writing is different, the fatal points are the same)
#!/bin/bash
/usr/bin/inotifywait-mrq--format '%w%f '-e create,close_write,delete/backup |while Read file
#把发生更改的文件列表都接收到file and then loop, but what's the use of it? The following commands do not refer to this $file is done under the full amount of rsync
Todo
Cd/backup && Rsync-az--delete/backup/rsync_backup@192.168.24.101::backup/--password-file=/etc/ Rsync.password
Done
#注意看 The rsync here is a full sync every time (this is a pit dad), and the file list is a circular form that triggers rsync, equal to 10 files that have changed, triggering 10 rsync full time synchronization (a nightmare), it's better to write a dead loop of rsync total synchronization.
#有很多人会说 log Output There is clearly only a sync record of the difference file. In fact, this is the function of rsync, he will only output a difference need to sync the file information. Do not believe you to take this line of Rsync run to try.
#这种在需要同步的源目录文件量很大的情况下, it was simply overwhelmed. Not only CPU consumption, but also time-consuming, can not do real-time synchronization.
Note: Backup configures module for Rsync server, in addition to scripting, you need to configure an rsync server,rsync server configuration Reference "http://www.ttlsa.com/linux/ rsync-install-on-linux/"
Improved methods
To do this in real time, you have to reduce the recursive scan judgment of the directory by rsync, as much as possible to synchronize only inotify to monitor the files that have changed. Combined with the characteristics of rsync, so here to separate judgment to achieve a directory additions and deletions to check the corresponding operation.
The script is as follows
#!/bin/bash
src=/data/# The source path that needs to be synchronized
Des=data # The name of Rsync--daemon issued on the target server, rsync--daemon here do not introduce, online search, relatively simple.
RSYNC_PASSWD_FILE=/ETC/RSYNCD.PASSWD # rsync Authenticated password file
ip1=192.168.0.18 # target server 1
ip2=192.168.0.19 # target server 2
User=root # rsync--daemon defined authentication user name
CD ${SRC} # In this method, due to the characteristics of rsync synchronization, you must first CD to the source directory, inotify to listen again./to Rsync after synchronization directory structure is consistent, interested students can conduct various attempts to watch its effect
/usr/local/bin/inotifywait-mrq--format '%xe%w%f '-e modify,create,delete,attrib,close_write,move./| While read file # Loops monitoring to the file path list that has changed
Todo
ino_event=$ (echo $file | awk ' {print} ') # INotify output cut to assign the event type part to Ino_event
ino_file=$ (echo $file | awk ' {print $} ') # INotify output cut the file path part assigned to Ino_file
echo "-------------------------------$ (date)------------------------------------"
Echo $file
#增加, modify, write completion, move into events
#增, change in the same judgment, because they are definitely for file operations, even if the new directory, to sync is only an empty directory, will not affect speed.
if [[$INO _event =~ ' CREATE ']] | | [[$INO _event =~ ' MODIFY ']] | | [[$INO _event =~ ' close_write ']] | | [[$INO _event =~ ' moved_to ']] # judging event types
Then
Echo ' CREATE or MODIFY or close_write or moved_to '
RSYNC-AVZCR--password-file=${rsync_passwd_file} $ (dirname ${ino_file}) ${user}@${ip1}::${des} && # INO_F Ile variable represents path oh-C Checksum file content
RSYNC-AVZCR--password-file=${rsync_passwd_file} $ (dirname ${ino_file}) ${user}@${ip2}::${des}
#仔细看 the rsync Sync command source above is using the $ (dirname ${ino_file}) variable, which is the directory of the files that have been changed every time a targeted synchronization occurs (only the method of synchronizing the target file will leak files in some extreme environments in the production environment now you can have a good speed without leaking files. Then use the-r parameter to recursively target the source directory structure to ensure directory structure consistency
Fi
#删除, move out of events
if [[$INO _event =~ ' DELETE ']] | | [[$INO _event =~ ' Moved_from ']]
Then
Echo ' DELETE or Moved_from '
RSYNC-AVZR--delete--password-file=${rsync_passwd_file} $ (dirname ${ino_file}) ${user}@${ip1}::${des} &&
RSYNC-AVZR--delete--password-file=${rsync_passwd_file} $ (dirname ${ino_file}) ${user}@${ip2}::${des}
#看rsync命令 If you synchronize a deleted path directly ${ino_file} will report no such or directory error so the source being synchronized here is the upper-level path of the deleted file or directory, plus--delete to delete the files on the target that are not in the source. This can't be done. Specifies the file deletion, and if the deleted path is closer to the root, the synchronized directory is more than the month, and the synchronization deletion takes more time. There are better ways for students to welcome the exchange.
Fi
#修改属性事件 Touch chgrp chmod chown and other operations
if [[$INO _event =~ ' ATTRIB ']]
Then
Echo ' ATTRIB '
if [! d ' $INO _file] # If you modify the directory for the attribute is not synchronized, because the synchronization directory will be recursive scan, and so the files in this directory synchronization, rsync will update this directory incidentally.
Then
RSYNC-AVZCR--password-file=${rsync_passwd_file} $ (dirname ${ino_file}) ${user}@${ip1}::${des} &&
RSYNC-AVZCR--password-file=${rsync_passwd_file} $ (dirname ${ino_file}) ${user}@${ip2}::${des}
Fi
Fi
Done
1 times full sync every two hours.
Because inotify only in the boot will monitor the directory, he did not start the file changes, he is not aware of, so here every 2 hours to do 1 times full volume synchronization, to prevent all kinds of accidental omission, to ensure that the directory consistent.
Crontab-e
* */2 * * * RSYNC-AVZ--password-file=/etc/rsync-client.pass/data/root@192.168.0.18::d ata && Rsync-avz--PASSW ord-file=/etc/rsync-client.pass/data/root@192.168.0.19::d ATA
After the improvement of our company such millions small files can also be implemented synchronously.
Attached below is a description of the parameters of INotify
INotify Introduction--is a powerful, fine-grained, asynchronous file system monitoring mechanism, *&####&*_0_*&####&* kernel from 2.6.13, joined INotify can monitor the file system to add, delete, Modify the various events such as mobile, using this kernel interface, you can monitor the file system under the various changes in files.
Inotifywait parameter Description
parameter name |
parameter description |
-m,–monitor |
always keep event listener status |
-r,–recursive |
recursive query directory |
-q,–quiet |
prints only information for monitoring events |
–excludei |
excludes files or directories, case-insensitive |
-t,–timeout |
timeout time |
–timefmt |
Specify time output format |
–format |
Specify time output format |
-e,–event |
To specify Delete, add, and change events |
inotifywait Events Event description
event name |
event description |
access |
read a file or directory contents |
Modify |
Modify File or directory contents |
attrib |
file or directory properties change |
close_write |
Modify the True file contents |
close_nowrite |
Close |
The
open |
file or directory is opened |
moved_to |
file or directory move to |
moved_from |
files or directories from the move |
move |
moves files or directories to the watch directory |
Create |
creates a file or directory under the watch directory |
Delete |
delete files or directories in the Watch directory |
delete_self |
unmount |
Uninstall file system |
Optimize Inotify
# There are three files in the/proc/sys/fs/inotify directory and there are certain limitations to the inotify mechanism
[Root@web ~]# ll/proc/sys/fs/inotify/
Total dosage 0
-rw-r--r--1 root root September 923:36 max_queued_events
-rw-r--r--1 root root September 923:36 max_user_instances
-rw-r--r--1 root root September 923:36 max_user_watches
-----------------------------
Max_user_watches #设置inotifywait或inotifywatch命令可以监视的文件数量 (Single process)
Max_user_instances #设置每个用户可以运行的inotifywait或inotifywatch命令的进程数
Number of events that the Max_queued_events #设置inotify实例事件 (event) queue can hold
----------------------------
[Root@web ~]# Echo 50000000>/proc/sys/fs/inotify/max_user_watches--add him to the/etc/rc.local to make every reboot effective.
[Root@web ~]# Echo 50000000>/proc/sys/fs/inotify/max_queued_events