1. the kernel uses three data structures to indicate opened files. The relationship between them determines the impact of one process on another process in file sharing. (1) each process has a record item (task_struct) in the progress table, including an open file descriptor table (stored in the user space ). Every...
1. the kernel uses three data structures to indicate opened files. The relationship between them determines the impact of one process on another process in file sharing.
(1) each process has a record item (task_struct) in the progress table, including an open file descriptor table (stored in the user space ). Each file descriptor is a table item, including the file descriptor identifier and pointer to a file table item.
(2) the kernel maintains a file table for all open files. each file table includes:
A). file status identifier (RD, WR, APPEND, synchronous and non-blocking ). Note that the file descriptor scope is a process, and the file status identifier applies to the descriptors of all processes pointing to the table item in the file table. Use the fcntl function to modify these two structures. The following section 3rd will explain.
B). current file offset
C) pointer to the table entry of the file v node.
(3) each open file or device has a v-node structure that contains the file type and pointer to the functions that perform various operations on the file. For most files, v-node also contains the I-node of the file (I-node contains the file owner, length, device, pointer to the file database location on the disk, etc ). This information is read into the memory from the disk when the file is opened, so all information about the file can be used quickly.
Linux does not use v-node, but uses a general I-node structure. Although the two implementations are different, in terms of concept, both v-node and I-node are the same and point to the I-node structure related to the file system.
The relationship between the three tables. one process opens two files: standard output (fd = 0) and standard input (fd = 1. Therefore, the open file descriptor table in the task_struct structure contains these two items (each item contains the fd Mark and the file pointer). The file pointer points to the file table (each item contains the file status mark, the current file offset and the v-node pointer), where the v-node pointer points to the v-node table (each item contains the v-node info, i-node info & current file length ). Linux does not divide the relevant data structures into I-node and v-node, but uses an I-node independent of the file system and an I-node dependent on the file system.
If the two processes open the same file, the file has two items in the file table, which records the current offset of each process to the file. However, there is only one entry in the v-node entry. After each write operation is completed, the current file offset in the file table item increases the number of bytes written. If the file offset exceeds the file length, the file length is updated to the current offset. Therefore, when you use the O_APPEND identifier to open a file, the corresponding identifier is also set to the file table item. For each file write operation, the file length of the I-node table item is set first, and the written data is added to the end of the file. No matter where lseek locates, only the offset of the current file table item is modified without any I/0 operations.
The fork-parent and child processes share a file table item for each opened file descriptor.
Dup points multiple file descriptors to the same table item.
2. because the current offset of each process is recorded in the file table, multiple processes can read one file at a time. However, writing to the same file by multiple processes produces unexpected results and requires atomic operations.
For example, multiple processes use the O_APPEND option to open a file. In fact, the write operation in the original version is equivalent to the following form:
If (lseek (fd, 0L, 2) <0)
Err_sys ("can not seek ");
If (lseek (fd, buf, 100 )! = 100)
Err_sys ("can not write ")
If it is not an atomic operation, multiple processes may add data using this method at the same time. (For example, the log write operation on multiple components executed by multiple threads on the application server .)
Extends the atomic operation functions of lseek + read/write in UNIX. the prototype is
# Include
Ssize_t pread (int filedes, void * buf, size_t nbytes, off_t offset );
Ssize_t pwrite (int filedes, void * buf, size_t nbytes, off_t offset );
Read/write nbytes characters from the offset of filedes (the offset at the beginning of the file. The current offset of the file does not change.
Calling pread is equivalent to calling lseek and read sequentially, but pread has the following important differences with this sequential call:
1. when pread is called, positioning and read operations cannot be interrupted.
2. do not update the file pointer.
Calling pwrite is equivalent to calling lseek and write sequentially. it is also an atomic operation and cannot be interrupted.
3. another scenario that requires atomic operations:
When a file is automatically created, O_CREAT and O_EXCL (if the file already exists, an error occurs. If the file does not exist, the file is created.) this prevents the original file record from being erased.
If (fd = open (pathname, O_WRONLY) <0 ){
If (errno = ENOENT) {// no such file or directory
If (fd = creat (pathname, mode) <0)
Err_sys ("creat error ");
} Else {
Err_sys ("open error ");
}
}
It may also cause problems between open and creat. For example, if process A executes open and does not detect the original file, it schedules it to process B to create A file and write something into it. This is the creat part of process A's continued execution. the newly created file will clear the file created by Process B. The record lock mechanism (APUE Chapter 14.3) needs to be introduced ).
Record locking prevents other processes from modifying the same file when a process reads or modifies a part of the file. Because the UNIX system kernel does not use the file record concept, it is more suitable for the byte range lock, because it only locks a region of the file.
The Linux kernel above supports flock functions (locking the entire file), fcntl record locks, and lockf.
# Include
Int fcntl (int filedes, int cmd ,... /* Struct flock */);
Struct flock {
Short l_type;/* F_RDLCK, F_WRLCK or F_UNLCK */
/* Shared read lock, dedicated write lock, and unlock an area */
Off_t l_start;/* offset in bytes, relative to l_whence */
/* Start byte offset of the region to be locked */
Short l_whence;/* SEEK_SET, SEEK_CUR, SEEK_END */
Off_t l_len;/* length, in bytes; 0 means lock to EOF */
/* In order to lock the entire file, set whence, start to SEEK_SET 0, and l_len to 0 */
Pid_t l_pid;/* returned with F_GETLK */
};
There are two types of locks: F_RDLCK shared read locks and exclusive write locks F_WRLCK. The basic rule is: multiple processes can have a shared read lock on a given byte, but only one process can use a write lock on a given byte. That is, if there is an exclusive write lock on a byte, no read lock can be applied to it.
Lock requests between different processes:
Read/write lock
No lock allowed
One or more read locks can be rejected.
A write lock is rejected.
Lock requests between processes:
If a process has a lock on a file interval and tries to add a lock to the same file interval, the new lock replaces the old lock.
When a read lock is applied, the description pair must be read. The write lock must be enabled by the write lock.
F_GETLK: determines whether the Lock described by flockptr will be excluded by another lock. If a lock exists, it prevents the creation of the description lock by the flockptr lock. If it does not exist, set l_type to F_UNLCK;
F_SETLK
F_SETLKW F_SETLK blocking version. If the lock created by the request is available, the process is awakened.
Implicit integration and release of locks:
(1). when the process is terminated, all established locks are released.
(2) If a descriptor is disabled, any lock that can be referenced by the process through this descriptor (provided that these locks are set by the process) will be released. For example
Fd1 = open (pathname ,...);
Read_lock (fd1 ,...);
Fd2 = dup (fd1 );
Close (fd2 );
Because fd2 is released, the fd1 lock is also released.
The child process generated by fork does not integrate the lock set by the parent process lock. The sub-process must call fcntl to obtain its own lock. This prevents the parent process and child process from writing a file at the same time.
After exec is executed, the new program can inherit the lock of the original execution program. However, if close-on-exec is set for a file descriptor, the file descriptor is closed during exec, and all the locks of the corresponding file are released.
The recommended locks are also called collaborative locks. For this type of lock, the kernel only provides addition and subtraction locks and checks for locking operations, but does not provide lock control and coordination. That is to say, if the application does not detect whether to lock a file or write data directly to the file regardless of the lock, the kernel will not block the control. Therefore, it is recommended to lock the process, instead of blocking operations on the file. Instead, you can only rely on the conscious detection of locking and then constrain your behavior;
The forced lock is the file lock of the OS kernel. During file operations, such as open, read, and write operations, the OS checks whether the file is locked forcibly. if the file is locked, the operation fails. That is, the kernel forces applications to comply with game rules;
For example, vim uses the recommended lock to edit a file. when multiple files are opened, a message is displayed, indicating that the file has been locked/is being edited. However, you can open and edit the file in an editor that does not use the lock creation function. The final State of the file depends on the process of the last File operation.
4. dup and dup2 functions: used to copy an existing file descriptor.
# Include
Int dup (int filedes); // the returned value is the minimum value of the currently available file descriptor.
// Its operation is equivalent to fcntl (filedes, F_DUPFD, 0 );
Int dup2 (int filedes, int filedes2); // Copy the result to return the descriptor specified by filedes2. If the descriptor specified by filedes2 is enabled, disable it first. If filedes and filedes2 are the same, they are not disabled. Its operation is equivalent to close (filedes2); fcntl (filedes, F_DUPFD, filedes2); however, dup2 is an atomic operation and there is no need to worry about synchronization.
The file descriptors returned by these two functions share a file table item with the filedes parameter.
5. sync, fsync, and fdatasync functions
When writing data to a file, the kernel usually copies the data to one of the buffers if the buffer is not full. Then it is not added to the output queue. Instead, wait until it is fully written or when the kernel needs to reuse the buffer to store data from other disk blocks, and then discharge the buffer into the output queue. Then, the actual I/O is performed only when it reaches the front of the team. this output mode becomes delayed write. This reduces the number of disk I/O operations, but reduces the number of file system updates. System faults may cause content loss. The UNIX system provides three functions: sync, fsync, and fdatasync to ensure the consistency of the actual file system on the disk and the content in the buffer at high speed.
# Include
Int fsync (int filedes );
Int fdatasync (int filedes );
Void sync (void );
Sync only adds all modified block buffers to the write queue, and then returns. Do not wait until the actual disk operation ends. Generally, the system daemon that becomes update calls the sync function periodically (30 s. This ensures regular flushing of the kernel's block buffer.
Fsync only applies to a single file specified by the file descriptor filedes, and waits for disk write operations and returns. Fsync ensures that modified blocks are written back to the disk for applications such as databases. (For example, this operation is required for Oracle redo logs !)
The fdatesync function is similar to fsync, but it only affects the data part of the file. In addition to data, fsync also includes file attribute updates.
6. fcntl function: changes the nature of opened files.
# Include
Int fcntl (int filedes, int cmd ,... /* An integer or record pointer */);
Functions:
(1) copy an existing file descriptor cmd = F_DUPFD; the new file descriptor is returned as the function value.
(2). obtain/set the file descriptor to Mark cmd = F_GETFD or F_SETFD. note: Be cautious when using F_SETFD and SETFL. You need to obtain the current value and modify it as needed.
(3). get/set the file status identifier cmd = F_GETFL or F_SETFL; the file identifier corresponding to F_GETFL is returned as the function value. Start with O _ (the open function section describes ). The three access identifiers O_RDWR, O_WRONLY, and O_RDONLY do not each occupy one. they are mutually exclusive. You need to use the shielded word O_ACCMODE to obtain the access mode bit and then compare it with one of them. F_GETFL sets the file part to the third parameter. It is often used to set the file status identifier or | operation ,~ The complement operation. For example, the current settings are flags | = O_APPEND. First obtain the current value, int val; (val = fcntl (fd, F_GETFL, 0)> = 0. set the flags option to val | = flags; set the flags option to val & = ~ Flags;
If O_SYNC is set, every write operation will wait until the data is written to the hard disk and then return. The database system needs this mechanism to confirm that the data returned by the write statement is written to the disk to avoid data loss during system crash. However, the system clock time is greatly increased.
(4). get/set asynchronous I/O ownership cmd = GETOWN and SETOWN; get/set the process ID and group ID of the current SIGIO and SIGURG signal.
(5). obtain/set the cmd = F_GETLK, F_SETLK or F_SETLKW record;
#include "apue.h"
-
- # Include
-
-
- Int
-
- Main (int argc, char * argv [])
-
- {
-
- Int val;
-
-
- If (argc! = 2)
-
- Err_quit ("usage: a. out ");
-
-
- If (val = fcntl (atoi (argv [1]), F_GETFL, 0) <0)
-
- Err_sys ("fcntl error for fd % d", atoi (argv [1]);
-
-
- Switch (val & O_ACCMODE ){
-
- Case O_RDONLY: printf ("read only"); break;
-
- Case O_WRONLY: printf ("write only"); break;
-
- Case O_RDWR: printf ("read write"); break;
-
- Default: err_dump ("unknown access mode ");
-
- }
-
-
- If (val & O_APPEND)
-
- Printf (", append ");
-
- If (val & O_NONBLOCK)
-
- Printf (", nonblocking ");
-
- # If defined (O_SYNC)
-
- If (val & O_SYNC)
-
- Printf (", synchronous writes ");
-
- # Endif
-
- # If! Defined (_ POSIX_C_SOURCE) & defined (O_FSYNC)
-
- If (val & O_FSYNC)
-
- Printf (", synchronous writes ");
-
- # Endif
-
- Putchar ('\ n ');
-
- Exit (0 );
-
- }