Basics of Linux programming (III): file descriptor and inode knowledge

Source: Internet
Author: User

Each process has a task_struct struct in the Linux kernel to maintain process-related information, known as process descriptor. In operating system theory, it is called a process control block (PCB, process control block ). Task_struct has a pointer (struct
Files_struct * files;) points to the files_struct struct, which is called a file descriptor table. Each table item contains a pointer to an opened file, as shown in.




The user program cannot directly access the file descriptor table in the kernel, but can only use the index of the file descriptor table (numbers 0, 1, 2, and 3 ), these indexes are called file descriptors and saved with int variables. When open is called to open a file or create a new file, the kernel allocates a file descriptor and returns it to the user program. The pointer in the file descriptor table points to the new file. When a file is read/written, the user program passes the file descriptor to read or write. The kernel finds the corresponding table item based on the file descriptor, and then finds the corresponding file through the pointer in the table item.

An opened file is represented by a file struct in the kernel. the pointer in the file descriptor table points to the file struct. Maintain the File status flag (f_flags) and current read/write position (f_pos) in the file struct ). In process 1 and process 2 both open the same file, but corresponding to different file struct, so there can be different file status flag and read/write location. F_count is an important member in the file struct, which indicates reference.
Number (reference count), such as dup, fork, and other system calls may cause multiple file descriptors to point to the same file struct. For example, both fd1 and fd2 reference the same file struct, the reference count is 2. When close (fd1) is used, the file struct is not released, but the reference count is reduced to 1. If close (fd2) is used ), the reference count will be reduced to 0 and the file structure will be released. This will actually close the file. Each file structure points to a file_operations struct. All the members of this struct are function pointers pointing to the implementation
Kernel functions for various file operations. For example, read a file descriptor in a user program. Read enters the kernel through a system call. Then, find the file struct to which the file descriptor points and find the file_operations structure to which the file struct points, call the kernel function pointed to by its read member (for example, the implementation function in the kernel code may be sys_read () to complete the user request. Call functions such as lseek, read, write, ioctl, and open in the user program, and the kernel calls file_operations.
The kernel functions that each member points to complete user requests. The release Member in the file_operations struct is used to complete the close request of the user program. The reason is called release rather than close because it does not necessarily close the file, but reduces the reference count, close the file only when the reference count is reduced to 0. For general files opened on the same file system, the steps and methods for file operations such as read and write should be the same, and the called functions should be the same, therefore, the file struct of the three open files in the figure points to the same file_operations struct. If you open a character device file, its read, write
The operation must be different from the conventional file, not the data block of the read/write disk, but the read/write hardware device. Therefore, the file structure should point to different file_operations structures, various file operation functions are implemented by the driver of the device.


Each file structure has a pointer to the dentry struct. "dentry" is the abbreviation of directory entry. The parameters passed to functions such as open and stat are a path, such as/home/akaedu/a. inode of the file needs to be found based on the path. To reduce the number of disk reads, the kernel caches the tree structure of the Directory, which is called the dentry cache. Each node is a dentry struct, and you only need to search for the dentry of each part of the path, find the home directory from the root directory/and then find
Go to the akaedu directory and find file. The dentry cache only saves the recently accessed directory items. If the directory items to be found are not in the cache, they must be read from the disk to the memory.
Each dentry structure has a pointer pointing to the inode structure. The inode struct stores the Information read from the disk inode. In the example, there are two dentry, indicating/home/akaedu/a and/home/akaedu/B respectively. They all point to the same inode, indicating that these two files are hard links to each other. Inode struct stores the inode read information from the disk partition, such as the owner, file size, file type, and permission bit. Each inode structure has a pointer to the inode_operations junction, which also points to some kernel functions that complete file directory operations.
Unlike file_operations, inode_operations is not a function that operates on a file, but a function that affects file and directory layout, such as adding and deleting files and directories, and tracking symbolic links, inode structs belonging to the same file system can point to the same inode_operations struct. The inode struct has a pointer to the super_block struct. The super_block struct stores the information of the super block read from the disk partition, such as the file system type and block size. The s_root Member of the super_block struct refers to
The pointer to dentry indicates where the root directory of the file system is mounted, and in the example, this partition is mounted to the/home directory.
The structures file, dentry, inode, and super_block constitute the core concepts of VFS. For the ext2 file system, there are also inode and super block concepts in the disk storage layout, so it is easy to establish a correspondence with the concepts in VFS. In addition, some file system formats come from non-Unix systems (such as Windows FAT32 and NTFS) and may not have the concept of inode or ultra-level blocks. However, in order to mount the file system to Linux, I had to make a hard copy in the driver. in Linux, The FAT32 and NTFS partitions will find that the permission bit is wrong, and all files are rwxrwxrwx.
Because they do not have the concepts of inode and permission bit.

In a UNIX system, a user logs on to the system through a terminal and obtains a shell process. The terminal becomes the controlling terminal of the shell process. The control terminal is the information stored in the PCB, we know that fork will copy information in the PCB, so the control terminal of other processes started by the shell process is also the terminal.
By default (no redirection), the standard input (stdin), stdout, and stderr of each process point to the control terminal, because when the program is started (before the main function is executed), the control terminal is automatically opened three times and assigned to three files respectively * The control terminal stdin, stdout, and stderr, these three file pointers are global variables defined in libc. The descriptors of these three files are 0, 1, and 2, respectively, and are saved in the corresponding file
Struct. The process reads the user's keyboard input from the standard input, and writes the process to the standard output or standard error output, that is, the output to the display.
The header file unistd. h contains the following macro definitions to indicate the three file descriptors:
# Define stdin_fileno 0
# Define stdout_fileno 1
# Define stderr_fileno 2

Each process can access its control terminal through a special device file/dev/tty (character Device C. In fact, each terminal device corresponds to a different device file./dev/tty provides a common interface, A process can access its control terminal either through/dev/tty or through the device file corresponding to the terminal device. The ttyname function can be used to identify the corresponding file name by the file descriptor. The file descriptor must point to a terminal device and cannot be any file. The device file name corresponding to different terminals can be/dev/pts /?, /Dev/tty? And so on


Simba @ simba-Aspire-4752 :~ $ LS-L/dev/tty

CrW-RW-1 root tty 5, 0 Jan 29 09:46/dev/tty

C indicates that the file type is a character device. In the middle of the 5, 0 is its device number, the main device number 5, the next device number 0, the main device number identifies a device driver in the kernel, the device ID identifies a device managed by the device driver. The kernel finds the corresponding driver through the device number to complete operations on the device. We know that the column of the regular file should display the file size, while the column of the backup file shows the device number, which indicates that the device file does not have the file size attribute, because the device files do not store data on the disk, the read and write operations on the device files are not read and write data on the disk, but on the reading and writing devices.

The file descriptor returned by open must be the minimum descriptor not used by the process. Because the file descriptor 0, 1, and 2 are automatically opened when the program starts, the first call to open the file usually returns descriptor 3, and then call open to return 4. You can use this to open a new file on the standard input, standard output, or standard error output for redirection. For example, first call close to close file descriptor 1, and then call open to open a regular file, it will certainly return file descriptor 1. At this time, the standard output is no longer a terminal, but a regular file, the call to printf will not print to the screen, but write to this
Files.

It should be noted that when a process is terminated, the kernel calls close to close for all file descriptors of the process that have not been closed, so even if the user program does not call close, when the kernel is terminated, all files opened by the kernel are automatically closed. However, for a program that has been running for years (such as a network server), remember to close the opened file descriptor. Otherwise, as more files are opened, a large amount of file descriptors and system resources will be occupied.

Certificate ------------------------------------------------------------------------------------------------------------------------------------

Traditional Unix has both vnode and inode. The data structure of vnode contains inode information. However, in Linux, General inode is used instead of vnode. "Although the implementation is different, it is the same concept ."
Vnode ("virtual node") only appears when the file is opened. inode locates the file on the disk, and its information is stored on the disk, when the file is opened, the memory is read from the disk.


The inode struct records a lot of information about the file, such as the file length, the device where the file is located, the physical location of the file, creation, modification, and update time. In particular, it does not contain the file name! All file names and directory names under the directory are stored in the data block of the Directory, that is, the directory block. For regular files, the data of files is stored in data blocks. A file usually occupies one inode, but it usually occupies multiple data blocks, A data block is the "minimum storage unit" specified during file system formatting in a partition. The size of the data block is 2 ^ n times the size of the sector, and the size of the data block is B.


  • If multiple inode points to the same data block, can you implement a familiar link ?! This is the principle of soft connection. Create a file (a symbolic link file, which is clearly indicated by its attributes as a symbolic link file) and assign a new inode to the file to be linked, then point to the same data block. When we use ls
    When you view a directory or file, if you add the-I parameter, you can see the inode node. For example, LS-Li lsfile. Sh, the first value is the inode information.
  • Multiple files share one inode, which can also be linked ?! This is the principle of hard link. inode has a link counter. When a file is added to this inode, the counter increases by 1. In particular, when the counter is 0, the file is actually deleted from the disk. That is, the second column in the LS-l command output.

Reference: Linux C Programming one-stop learning (open source books)

Http://daoluan.net/blog/inode?vnodeand dentry/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.