Synchronization is the writing of dirty pages in physical memory to disk, ensuring that the contents of the disk and physical pages are consistent.
When to trigger a sync operation:
1. periodic kernel threads, scan dirty pages, select dirty pages according to certain rules, and write pages back to disk.
2. If there are too many dirty pages in the kernel, synchronization is triggered
3. other components in the kernel trigger synchronous operations (such as sync, fsync, and Fdatasync function calls)
Traditional UNIX implementations have a buffer cache or page cache in the kernel, and most disk I/O is buffered. When writing data to a file, the kernel usually copies the data into one of the buffers, and if the buffer is not yet full, it is not queued to the output queue, but waits for it to be full, or when the kernel needs to reuse the buffer to hold other disk block data, then the buffer is queued to the output queue, and then when it arrives at the first To perform the actual I/O operation. This output is called deferred write (delayed Write) (Bach [1986] in 3rd chapter discusses the buffer cache in detail).
Deferred write reduces disk read and write times, but reduces file content updates so that data written to the file is not written to disk for a period of time. In the event of a system failure, this delay may result in the loss of file update content. To ensure consistency between the actual file system on the disk and the content in the buffer cache, the UNIX system provides sync, Fsync, and Fdatasync three functions.
The sync function or command: Simply queues up all the modified block buffers into the write queue and returns it without waiting for the actual write disk operation to end. The system daemon, commonly referred to as update, periodically calls the sync function (typically every 30 seconds). This ensures that the block buffers of the kernel are flushed periodically. Command Sync (1) also calls the Sync function. Linux by default almost all applications will not be immediately saved to the disk, embedded in the same performance, if mount time with the Sync option to ensure that the data is written immediately, but this will cause the system to write more disk, the life of the disk is not too long, If you need to save the data, you can explicitly call command sync to write all the files, or Fsync to save a file. UNIX System operating experience shows that to ensure reliability, the two-pass sync command should be performed because the Sync command does not guarantee that the information is actually written to disk, although it has been executed once. after the CP is finished, the sync command is executed to write the contents of the buffer to disk.
The Fsync function works only on a single file specified by the file descriptor Filedes, and waits for the write disk operation to end and then returns. Fsync can be used in applications such as databases, which need to ensure that modified blocks are immediately written to disk.
The Fdatasync function is similar to Fsync, but it affects only the data portion of the file. In addition to the data, Fsync also synchronizes the properties of the updated file.
For a database that provides transactional support, when a transaction commits, it is necessary to ensure that the transaction log (which contains the modification operation and a commit record) is fully written to the hard disk before the transaction is committed and returned to the application tier.
A simple question: on the *nix operating system, how to ensure that the updated content of the file is successfully persisted to the hard disk?
Write not enough, need fsync
In General, the write operation on the hard disk (or other persistent storage device) file updates only the in-memory page cache, and dirty pages are not immediately updated to the hard disk, but are uniformly dispatched by the operating system. If a dedicated flusher kernel thread synchronizes a dirty page to the hard disk (in the IO request queue of the device) within a certain condition, such as a certain time interval and a certain percentage of dirty pages in memory. because the write call does not return after the hard disk IO is complete, the data may be lost if the OS crashes after the write call and before the hard disk is synchronized. Although such a time window is small, the "loosely asynchronous semantics" provided by write () is not enough for a database program that needs to ensure transactional persistence (durability) and consistency (consistency). The synchronization IO (Synchronized-io) primitives provided by the OS are often required to guarantee:
1 #include <unistd.h>2int fsync (int fd);
the function of Fsync is to make sure that all modified content of file FD has been correctly synced to the hard disk, and that the call will block wait until the device reports IO completion. PS: If you use a memory-mapped file for file IO (using mmap to map the file's page cache directly to the address space of the process, modify the file by writing memory), there are similar system calls to ensure that the modified content is fully synchronized to the hard disk:
1 #incude <sys/mman.h>2int msync (voidint flags)
Msync need to specify a synchronized address range, so fine-grained control seems more efficient than fsync (because the application usually knows its dirty page location), but in fact (Linux) kernel has a very efficient data structure that can quickly find the dirty pages of a file. Allows Fsync to synchronize only the modified contents of the file.
Fsync performance issues, with Fdatasync
In addition to synchronizing the modified content (dirty pages) of the file, Fsync also synchronizes the file's descriptive information (metadata, including size, Access time St_atime & St_mtime, and so on), because the file's data and metadata usually exist in different places of the hard disk, So fsync requires at least two IO writes, Fsync's man page says:"unfortunately Fsync () would always initialize-write Operations:one for the newly written data and another one in Order to update the modification time stored in the inode. If The modification time isn't a part of the transaction concept Fdatasync () can being used to avoid unnecessary inode disk Write operations. "
How expensive is the extra IO operation? According to the Wikipedia data, the current hard drive average seek time (Average seek times) is approximately 4ms for the average rotational delay of the 3~15ms,7200rpm hard drive (Average rotational latency), So an IO operation takes around 10ms. What does this number mean? It will also be mentioned below.
POSIX also defines Fdatasync, which relaxes the semantics of synchronization to improve performance:
1 #include <unistd.h>2int fdatasync (int fd);
the Fdatasync function is similar to the Fsync, but synchronizes metadata only when necessary, thus reducing the IO write operation to one time. So what is the "necessary situation"? According to the explanation in the man page:"Fdatasync does not flush modified metadata unless this metadata is needed in order to allow a subsequent data Retriev Al to be corretly handled. " For example, if the size of the file (st_size) changes, it needs to be synchronized immediately, otherwise the OS crashes, even if the data portion of the file is synchronized, because metadata is not synchronized, the modified content is still not read. The last access Time (atime)/modification Time (mtime) is not required to be synchronized every time, as long as the application has no harsh requirements for these two timestamps, basically harmless. the Ps:open parameter o_sync/o_dsync has the same semantics as Fsync/fdatasync: Each write will block waiting for the hard disk IO to complete. (In fact, Linux does the same for O_sync/o_dsync, does not meet POSIX requirements, but all implements the semantics of Fdatasync) relative to Fsync/fdatasync, such a setup is not flexible enough, should be used infrequently.
using Fdatasync to optimize log synchronization As mentioned at the beginning of the article, in order to satisfy the transaction requirements, the log files of the database are often required to synchronize IO. Because of the need to wait for the hard disk IO to complete synchronously, the commit operation of a transaction is often time-consuming and a bottleneck for performance. under Berkeley DB, if Auto_commit is turned on (all independent writes automatically have transactional semantics) and the default synchronization level is used (the log is fully synchronized to the hard disk to return), writing a record takes approximately 5~10ms levels. The basic and one IO operation (10ms) takes the same time. we already know that fsync on synchronization is inefficient. However, if you need to use Fdatasync to reduce the metadata update, you need to make sure that the size of the file does not change before or after the write. Log files are inherently additional (append-only), always growing, and it seems difficult to take advantage of good fdatasync. and see how Berkeley DB handles log files:
1. Each log file is fixed to 10MB size, numbering starts at 1, and the name format is "log.%0 10d "
2. Each time the log file is created, the last 1 page of the file is written, and the log file is expanded to 10MB size
3. When appending records to the log file, using Fdatasync can greatly optimize the efficiency of writing log because the size of the file does not change.
4. If a log file is full, create a new log file with only one synchronization metadata overhead
Two other commands:async (io async), rsync (remote sync).
Rsync is a powerful implementation of a compact algorithm. Its most basic function is the ability to mirror a file system efficiently. With rsync, you have the flexibility to use a set of network protocols, such as NFS, SMB, or SSH, to keep one file system in sync with another file system. The second function of Rsync, the function used by the backup system, is to archive older versions of files that have changed or have been deleted.
Rsync is a remote data synchronization tool that enables fast synchronization of files between multiple hosts via Lan/wan. Rsync was originally a tool used to replace RCP, which is currently maintained by Rsync.samba.org. Rsync uses the so-called "rsync algorithm" to synchronize files between local and remote two hosts, which transmits only the different portions of two files, rather than the entire transfer every time, so it is quite fast. The machine running Rsync server is also called Backup server, a rsyncserver can back up multiple client data at the same time, or multiple rsync servers can back up one client's data.
Rsync can be paired with rsh or SSH or even use daemon mode. Rsync server will open a 873 service channel (port), waiting for the other rsync connection. When connected, Rsyncserver will check if the password matches, and if you check by password, you can start the file transfer. When the first connection is complete, the entire file is transmitted once, and the next time only the different portions of the two files are transmitted.
Rsync supports most Unix-like systems, which are well tested on Linux, Solaris, and BSD. In addition, it also has the corresponding version under the Windows platform, the more well-known have cwrsync and Sync2nas.
The basic features of rsync are as follows:
1. Can be mirrored to save the entire directory tree and file system;
2. It is easy to maintain the original file permissions, time, soft and hard links, etc.;
3. Can be installed without special permission;
4. Optimized process, high efficiency of file transfer;
5. You can use RCP, ssh and other means to transfer files, of course, you can also through a direct socket connection;
6. Support for anonymous transmissions.
Command syntax
The command format for rsync can be in the following six ways:
rsync [OPTION] ... SRC Destrsync [OPTION] ... SRC [[Email Protected]]host:destrsync [OPTION] ... [[email protected]] HOST:SRC Destrsync [OPTION] ... [[email protected]] HOST::SRC Destrsync [OPTION] ... SRC [[Email protected]]host::D estrsync [OPTION] ... rsync://[[Email protected]]host[:P ort]/src [DEST]
The
corresponds to the above six command formats, and Rsync has six different modes of operation:
1) Copy Local files. This mode of operation is initiated when both the SRC and des path information do not contain a single colon ":" delimiter.
2) Use a remote shell program (such as rsh, SSH) to copy the contents of the local machine to the remote machine. This mode is started when the DST path address contains a single colon ":" delimiter.
3) Use a remote shell program (such as rsh, SSH) to copy the contents of the remote machine to the local machine. This mode is started when the SRC address path contains a single colon ":" delimiter.
4) Copy files from the remote rsync server to the local machine. This mode is started when the SRC path information contains the "::" delimiter.
5) Copy files from the local machine to the remote rsync server. This mode is started when the DST path information contains the "::" delimiter.
6) column list of files for remote machines. This is similar to the rsync transfer, but only if the local machine information is omitted from the command.