Discussion on multi-process concurrent writing of the same file in Linux system environment _

Discussion on multi-process concurrent writing of the same file in Linux system environment __linux

Last Update:2018-07-26 Source: Internet

Author: User

Tags rar

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

To discuss the issue of writing to the same file by multiple processes in a concurrent environment, we will involve knowledge of file sharing. Before we begin, let's discuss some knowledge about file sharing.

1. File sharing

UNIX systems support the sharing of open files among different processes. To do this, let's first describe the kernel's data structure for all I/O. Note that the following description is conceptual and may or may not match a specific implementation.

The kernel uses three data structures to represent open files, and the relationship between them determines the possible impact a process can have on another process in file sharing.

(1) Each process has a record entry in the process table that contains an open file descriptor that can be treated as a vector with one entry for each descriptor. associated with each file description story is:

(a) The Document Descriptor Identification (CLOSE_ON_EXEC).

(b) A pointer to a file table entry.

(2) The kernel maintains a single file table for all open files. Each file Table entry contains:

(a) file status flags (read, write, add, sync, non-blocking, etc.).

(b) Current file offset.

(3) Each open file (or device) has a V-node (V-NODE) structure. The V node contains pointers to the file types and functions that perform various operations on this file. For most files, the V node also contains the I node (I-node, index node) of the file. This information is read from disk to memory when the file is opened, so all information about the file is quick to use. For example, the I node contains the owner of the file, the length of the file, the device where the file is located, pointers to where the actual block of the file is located on the disk, and so on.

Note: Linux does not use the V node, but instead uses the generic I node structure. Although the two implementations are different, the V node is conceptually the same as the I node. Both point to the file system-specific I-node structure.

We overlook the details of the implementation of the dictation, but that does not affect our discussion. For example, opening a File descriptor table can be stored in a user control, not in the process table. These tables can also be implemented in a variety of ways, not necessarily arrays; For example, they can be implemented as a structure's join table. These details do not affect our discussion of file sharing.

Figure 1 shows the relationship between the three tables of a process. The process has two different open files: one file opens as callout input (the file descriptor is 0) and the other opens as standard output (the file descriptor is 1). Since the previous version of UNIX systems [Thompson 1978], the basic relationship between these three tables has remained so far. This arrangement is important for how files are shared among different processes.

Figure 1 Kernel data structure for open files

Note: The purpose of creating a V-node structure is to provide support for multiple file system types on a computer system. The work was done independently by Peter Weihberger (Bell Labs) and Bill Joy (Sun) respectively. Sun calls this file system a virtual file system, which says the I node portion of the file system type is not a v node [Kleiman 1986]. When the implementation of the brother manufacturer adds support for Sun's network File system (NFS), They are widely used in the V-node structure. The first v-node in the BSD system is the 4.3BSD Reno version, with NFS added. In SVR4, the V node is substituted for the I-node structure that is independent of the file system type in SVR3. Solaris was developed from SVR4, and he also used the V node. Linux does not divide the related data structures into I nodes and V nodes, but instead uses a file system-independent I node and a file system-dependent I node.

If two separate processes have each opened the same file, there are the arrangements shown in Figure 2. Let's assume that the first process opens the file on file descriptor 3, while the other process opens the file on file Description 4. Each process that opens the file gets a file table entry, but only one v-node table entry for a given file. One reason each process has its own file table entry is that this arrangement allows each process to have its own current offset to the file.

Figure 22 separate processes each open the same file

After the data structure is given, the operation described above is now further explained. When each write is completed, the current file offset in the file table entry increases the number of bytes written. If the current file offset exceeds the current file length, the current file length in the I-node table entry is set to the current file's offset. If a file is opened with the O_append flag, the corresponding flag is also set to the file status flag of the File table item. The current file offset in a file table entry is first set to the length of the file in the I-node table entry, each time a write is performed on this file with the add write flag. This causes each write data to be added to the current end of the file. If a file is positioned with Lseek to the current end of the file, the current file offset in the file table entry is set to the current file length in the I-node table entry. Note that this is different from opening the file with the O_append flag. The sleek function modifies only the current file offset in a file table entry and does not perform any file I/O operations.

There may be multiple file descriptors pointing to the same file table entry. We'll see this when we discuss the DUP function in the next section. In a parent-child process that is generated after the function calls fork, they share the same I or V nodes and the same file table entries. (by testing in a modern Linux system, test procedures and structures are shown below (correction: Incorrect analysis of test results.) has been corrected. ））。

Note that the difference in scope between the file descriptor flag and the file status flag is used only for one descriptor of a process, which applies to all descriptors in any process that points to the given file table entry. Everything described above will work correctly for multiple processes reading the same file. Each process has its own file table entry, which also has its own current file offset. However, when multiple processes write the same file, they may produce unexpected results. To illustrate how to avoid this situation, we need to understand the concept of atomic operations.

2. Atomic operation

2.1 Add to a file

Consider a process that adds data to the end of a file. The earlier UNIX system version does not support the open o_append option, so the program is written in the following form: if (Lseek (FD, 0L, 2) < 0)/* position to EOF * *
Err_sys ("Lseek error");
If write (FD, buf,!=)/* and write */
Err_sys ("write error");

This program works correctly for a single process, but it can cause problems if you use this method to add data to the same file at the same time for multiple processes. (for example, this can happen if the program is executed by multiple processes at the same time, and each message is added to a log file.) ）

Suppose there are two separate processes A and b that operate on the same file, and the file is already open for a process, but the O_APPEND flag is not used. At this point, the relationship between the data structures is shown in Figure 2. Each process has its own file table entry, but it shares a V-node table entry. Assuming process a invokes lseek, it sets the current offset of the file for process A to 1500 bytes (at the end of the current file). The kernel dispatch process then makes process B run. Process B executes the sleek and sets the current offset of the file to 1500 bytes (at the end of the current file). b then calls the Write function, which increments the current file offset of B to 1600. The length of the file has been increased, so the kernel has updated the current file length in the V node to 1600. The kernel then processes a process switch to restore the process a to run. When a calls write, the data is written to the file from its current file offset (1500 bytes). This has substituted the data that process B just wrote to the file.

The problem is that the logical operation "navigates to the end of the file and then writes", it uses two separate function calls. The solution to the problem is to make these two operations an atomic operation for other processes. Any operation that requires multiple function calls cannot be atomic, because the kernel might temporarily suspend the process between two function calls.

A UNIX system provides a way for this to be an atomic operation, which is to set the O_APPEND flag when the file is opened. As mentioned earlier, this is how the kernel sets the current offset of the process to the end of the file each time the file is written, so it is not necessary to invoke sleek before each write.

2.2 Pread and Pwrite functions

The

     single UNIX specification includes the XSI extension, which allows for the atomic positioning of search (seek) and execution I (/O. Pread and Pwrite are the extensions. #include <unistd.h>

ssize_t pread (int filedes, void *buf, size_t nbytes, off_t offset);
                                           return Value: The number of bytes read, if the end of the file returns 0, if error returns-1
ssize_t pwrite (int filedes, const void *buf, size_t nbytes, off_t offset);
                                           return value: Returns the number of bytes written if successful, and returns 1 if an error occurs

Calling Pread is equivalent to calling Lseek and read in order, but Pread has the following important differences with this sequential invocation: When you invoke Pread, you cannot break its location and read operations. The file pointer is not updated.

Calling Pwrite is equivalent to calling Lseek and write sequentially, but there are similar differences.

2.3 Creating a File

When we illustrate the o_creat and O_EXCL options for the Open function, we have seen another example of atomic operation. When both options are specified and the file already exists, open fails. We have mentioned that checking to see if the file exists and creating the file is performed as an atomic operation. Without such an atomic operation, you might write the following program segment: if (FD = open (pathname, o_wronly) < 0) {
if (errno = = enoent) {
if (fd = creat (pathname, mode) < 0)
Err_sys ("Create Error");
} else {
Err_sys ("open error");
}
}

If the file is created in another city between open and creat, it can cause problems. For example, if you are between these two function calls. Another process creates the file and writes some data, and then the original process executes the creat in the program, and the data just written by another process is erased. If the two were merged into an atomic operation, the problem would not exist.

In general, an atomic operation (atomic operation) refers to a multi-step operation that, if executed atomically, either finishes all steps or does not perform a single step, and it is not possible to perform only a subset of all steps.

3. DUP and DUP2 functions

The following two functions can be used to copy an existing file descriptor: #include <unistd.h>

int dup (int filedes);
int dup2 (int filedes, int filedes2);
return value of two functions: Returns a new file descriptor if successful, and returns 1 if an error occurs

The new file character returned by DUP must be the smallest number in the currently available file descriptor. With dup2, you can specify the value of the new descriptor with the Filedes2 parameter. If the filedes2 is already open, close it first. If Filedes equals Filedes2, then Dup2 returns to Filedes2 without closing it.

The new file descriptor returned by these functions shares the same file table entry as the parameter parameter filesdes. As shown in Figure 3.

Figure 3 Kernel data structure after DUP execution

In Figure 3, we assume that the process executes: NEWFD = DUP (1);

When this function starts executing, it is assumed that the next available file description is 3 (which is very likely because 0, 1, and 2 are open by the shell). Because two descriptors point to the same file table entry, they share the same file status flag (read, write, add, and so on) and the current offset of the same file.

Each file descriptor has its own set of file descriptor flags. The shutdown (CLOSE-ON-EXEC) flag is always cleared by the DUP function when the new description is executed.

Another way to copy a descriptor is to use the FCNTL function, in effect, to invoke DUP (filedes);

Equivalent to Dup2 (filedes, F_DUPFD, 0);

While calling Dup2 (Filedes, filedes2);

Equivalent to close (FILEDES2);
Fcntl (Filedes, F_DUPFD, filedes2);

In the latter case, dup2 is not exactly equivalent to close () plus fcntl. The difference between them is that Dup2 is an atomic operation, while close and Fcntl contain two function calls. It is possible to insert an execution signal capture function between close and fcntl, which may modify the file descriptor. Dup2 and fcntl have certain different errno.

4. Test results

The test code and result data with the O_APPEND flag are as follows: O_append_text.rar

No test code and result data with O_append flags are as follows: Test.rar

As you can see, the O_append_text.rar inside the test result data file is twice times the size of the Test.rar. The reason for this is that when the function calls fork, there is no necessary synchronization for them in the parent-child process, so there is a competition when writing the file, which results in confusion. In addition, the file size differs because the value type of the file offset in the file table entry is not a volatile variable type, resulting in the value of the offset value variable read when the file is written to be not the most recent value, resulting in a different file size.

In addition, from the result data (perhaps the data does not fully show the following situation, but you can do this by adjusting the parameters in the test program and running several more test programs to get the results as follows: When the file is opened with the O_append flag, write performs an atomic operation. So is read. Instead of using the O_APPEND flag to open a file, the data output of the parent-child process will appear to be disorderly.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More