A file system is a subsystem that manages persistent data in the operating system, providing data storage and access capabilities. For server developers, the focus is on the UNIX (Linux) environment of the file system, such as partition and disk relationship, disk space, file type and permissions control, file links and other related knowledge.
Introduction to disk structure: back to top
The file system is built on top of the physical disk, so it is easy to understand the concepts behind it by simply introducing the structure of the disk before introducing the file system. This article on disk introduction may be relatively sketchy, interested readers can refer to the "hard disk read and write principle" this article, written very detailed. Let's start with a single structure:
The relevant terms are explained as follows:
- Disk Faces (Platter): Parallel storage media
- Head (Heads): Each head corresponds to a disk face, which is responsible for reading and writing data on the disk surface.
- Track: Each disc is divided into concentric circles around the center of the circle, each of which is called a track.
- Cylinder (cylinders): The three-dimensional track consisting of the same position on all platters is called a cylindrical surface.
- Sector (Sector): Managing disks in Tracks is still too large, so computer predecessors have divided each track into multiple sectors
- Head arm (ARM): Movement of the drive head
Logically, the disk is divided into sectors, the sector is the most basic unit of disk access, if shown, an area consists of cylinder, head,sector three blocks (CHS). Earlier operating systems needed to know all the parameters of a disk to read and write data, and now the disks are becoming more complex, and the sectors may have different sizes. Therefore, now the disk needs to provide a higher-order interface, the capacity of the disk Huan a set of logical blocks (blocks), the operating system directly to these logical blocks operation.
Reading and writing a disk may take three steps, so read and write performance depends on the speed of these three parts:
- Seek: The head movement is positioned to the specified track, very slowly
- Rotation delay (rotation): Waits for the specified sector to rotate from the head to the slower speed
- Data Transfer (transfer): The actual transmission between the disk and memory, the speed is relatively fast
Therefore, the main reason to improve performance is to minimize the cost of Seek (miliseconds) and rotation. While Seek is the most time-consuming, it can improve overall performance by minimizing the number of seek times at the same time as multiple read and write requests. Now the disk is to do their own scheduling, because compared to the OS, the disk more clearly their own parameters. In a Linux environment, you can use Fdisk to view information about a disk. This includes the total capacity of the disk, the size and mathematics of the sector, the correspondence between the disk and the partition, and so on. For example (is a virtual street result): What is a file system:
The first thing to know is what file, which is a collection of data items with a symbolic name, consisting of a sequence of bytes, and is the basic unit of data for the file system. When referring to the file, basically only focus on the contents of the file, but the file itself also has a lot of information, such as filename, type, location, size, permission control, creation time, modification time, etc., this information becomes the metadata of the file (meta data), in the UNIX system, The metadata is recorded in the Inode.
The file system is the subsystem that manages the files in the operating system, provides the file data storage and the access function, in particular, the file system should have the following functions:
- Allocate file disk space: You need to manage allocated file blocks, including location and order, manage all free space, allocate space for new files with a certain allocation algorithm
- Manage File Collections: Structure The information of a file in an organized way so that it can be found by name and read the contents of the file
- Data reliability and Security: the first is to provide different means, multi-layered protection of data security, such as access rights control. The file is guaranteed to be reliable by persisting the file, avoiding system crash errors and redundancy.
The operating system is generally a hierarchical file system, that is, the file is organized in the form of a directory, the directory can also contain subdirectories, so that the entire file system has formed a tree structure. Based on this hierarchical structure, file name resolution (with logical name to physical resource) is relatively simple, according to the full path of the file, traverse the file directory until the target file is found.
Due to the differences in history and use of the environment, there are various disk file systems, such as FAT, NTFS, EXT2, EXT3, and network Distributed file systems, such as NFS, SMB, different file systems, different functions, different security requirements, optimization goals are different. The virtual file system is introduced in the face of several different file systems and the need to provide a consistent interface to the upper layer. A partition (that is, a virtual file system) is not a one by one relationship with a disk, and one disk can correspond to multiple partitions, such as LVM, and multiple disks can correspond to a partition, such as a raid
The operating system maintains an open file table for each process, with a unique flag for each open file (file descriptor). The Open File table is indexed with a file descriptor, and the corresponding item maintains the status of the open file and related information, such as:
- File pointers: The most recent reading and writing position, C language students should not be unfamiliar. Although a file can be opened by multiple process colleagues, each process maintains its own file pointer
- File Open count: The number of times the file is currently open, and when the last process closes the file, remove it from the Open file table??
- Disk information for a file: The operating system caches part of the accessed file content in memory for faster access. The cached data information is also recorded
- Access rights: File access mode information for each process, options when opening a file, read-only, write-only, or readable writable
At the user's perspective, files are persisted data structures. In the operating system, the file is a collection of data blocks (block), the operating system to read and write the file data in a fast, even if only a single field, you need to read a block (size 1k, 2k or 4k), where the block is a logical storage unit, and the sector is a physical storage unit, Block size is not the same as sector size, in general, several sectors constitute a block of data.
file allocation:
file allocation refers to which blocks are assigned to a file to store data, including the location and order of the database. Here, the size of the block is very important, first: Most files are relatively small, so the block space can not be too large, need to provide good support for small files; second, some files are very large, must support large files, access needs to be efficient (if the block is too small, then large files require a lot of blocks). There are two metrics to measure the allocation strategy: storage efficiency, external fragmentation, read/write performance, or access speed. There are three ways to allocate the method:
- Continuous allocation: The advantage is high access efficiency, enabling efficient sequential and random access. The downside, however, is the introduction of external fragments, which are not well handled for file growth.
- Chained allocation: The advantage is that it is easy to create, grow, shrink, and has no external fragmentation. The disadvantage is that access is inefficient and does not enable real random access.
- Index allocation: The advantage is that it is easy to create, grow, shrink, have no external fragmentation, and support direct access. The disadvantage is that when the file is very small, the cost of storing the index, the processing of large files is also a problem to be considered
In Unix, multi-level index allocation is used.
As you can see: The file header contains a total of 13 pointers (PS: The above course from Tsinghua University, according to Inode_pointer_structure, now the operating system has 15 block pointer), which
- The first 10 pointers point directly to a database
- The 11th pointer points to the index block I
- The 12th pointer points to a two-level index fast
- The 13th pointer points to a three-level index fast
Multi-level index allocation increases the file size limit threshold, and can dynamically allocate data blocks, and file extensions are easy. When the file is small, the direct index, when the file is relatively large, can also be processed, but the efficiency will be almost.
file sharing and access control
In multi-user operating systems (such as UNIX), file sharing is necessary, the first part of the file for each user is the same (or the default is the case is the same), there is no need for each user to save one copy; second, users may need to work together to process the same document. Sharing requires mutual exclusion, which is the same as the synchronization of the process thread mutual exclusion is the same, for example, for the same file, a process in the read, another process in writing, how to coordinate, the operating system does not solve the problem of multiple processes to share files consistency, the need for English programs to standardize the solution.
Access control is divided into two levels, the first of which users can access (generalized access, not limited to read-write execution) to the file, and the second is how to access the file. A more general abstraction is what permissions a user has to access a file, what the file (objects) is, who the user (subjects) is, and how the way (actions) is. From the user's point of view, a list of files and permissions needs to be maintained, that is, Access Control list (ACL); From a file perspective, you need to maintain a list of users and users ' permissions to the file, that is, capabilities:
For each user (subject), there is an ACL, so when the user is used and a file is shared among many users, the ACL list becomes very large, so in Unix, the concept of group is proposed, Users within the same group share the same permissions on the same file.
File System performance:
Due to physical device constraints, disk file reads and writes are 108,000 of the same as memory. In order to improve the read and write efficiency of the file, the disk will be optimized according to its own parameters, such as trying to put a file (including metadata information) in the critical sector. At the same time, the file system will do some optimizations, such as:
The file buffer cache, which caches some of the file contents in memory, can greatly improve performance as memory reads and writes more quickly. The operating system uniformly manages the cache, and all processes share the cache information, and algorithms such as LRU are used when replacements are required.
Lag write, wtire behind, need to write operations, do not immediately brush to the disk, but to maintain the uncommitted block of the queue, periodically to brush the contents of the queue to disk. But the problem is unreliable and may result in inconsistent data.
Pre-read, read ahead, if the file system predicts that the process will read the next block, pre-reads the next block into the cache, which is very effective in a sequential file-reading environment.
RAID: Data Reliability
RAID is a redundant Array of independent Disks, which is a redundant sequence of independent disks. The basic idea is to combine a number of relatively inexpensive hard drives into a hard disk array to achieve even more expensive, large-capacity drives. Depending on the version selected, RAID has the benefit of one or more of the following aspects than a single hard drive: Enhanced data integration, enhanced fault tolerance, increased throughput or capacity. In addition, the disk array looks like a separate hard disk or logical storage unit for a computer. is a comparison table of various disk arrays on the wiki:
Files under Linux
The file system in the Linux environment is divided into three tiers with the following basic data structure:
(1) File Volume control block: Superblock
Each filesystem is loaded when the file system is hooked up, recording details of the filesystem, such as blocks, block sizes, free blocks, various counts, and pointers.
(2) Catalogue entry: Dentry (dictionary entry)
Each directory entry, which is loaded into memory when traversing a file, contains information such as a file control block, a parent directory, a subdirectory, etc.
(3) file control block: Inode
Each file is loaded into memory when the file is accessed, and the metadata for the file is recorded (meta data).
In the Linux environment, you can use DF (disk free) to view the remaining space of the file system on the system and the corresponding mount situation, and then use DUMPE2FS xxx (file system name) to view the file system information, including superblock information, a total/available block Inode information, the size of each block inode, and so on.
Meta data of the file
You can use List-l to view the metadata of a subset of files, but using the stat command is more detailed, for example:
The more important information includes:
The file type, which is already indicated as "regular file", is also used to indicate the file type for the first bit of access.
Inode: Metadata location index, each file unique
Links: How many filenames point to this file, followed by the introduction of the file link will also mention
Access: Accessing Permissions
Atime:access time, last accessed date of the file
Mtime:modification time, File content modified
Ctime:change time, file attributes (meta data) modified
As can be seen from the above, the three time attributes of a file may be different values, the three may affect each other, the relationship is as follows:
- Changes in Atime do not affect Mtime and CTime
- CTime changes will not affect Mtime and atime.
- Mtime changes will affect both atime and mtime, because the files must be accessed by the modified file, and the file content size is the metadata of the file.
For file systems, the type of file is also very important, and other parts of the operating system need to know the type information of the file. Under Windows, the file type is distinguished by the suffix name, and under UNIX, the type of the file is recorded by magic number. The file types under Linux can be seen by the first line of the Ls-l command, with the following types:
- Normal file, first field-
- Table of Contents, the first field of D
- Piping (pipe), the first field is P, used for interprocess communication (IPC)
- Socket (socket), the first field is S, previously introduced, UNIX domain socket, also used as interprocess communication
- Symbol link file, the first field is L, which will be explained in detail later
- Block device, the first field is B, which is the disk device
- Character device (character), the first field is C, refers to the mouse, keyboard lamp serial Device
which Ordinary files are also divided into plain text files, binary files, etc., can be viewed through the file command
Access Rights control
We all know that for a file, the user has three kinds of permissions: R (Readable), W (writable), X (executable), for ordinary files is very good understanding, but for the directory is what you mean, especially the directory of the write operation and execution?
R: Has permission to read the list of directory structures, such as using LS (list)
W: Has the right to change the directory structure list, as new files, directories, delete files, directories, rename, etc.
X: Can the user enter the directory as a working directory, such as using a CD
The reason why the CD has failed before is this permission issue.
Also can see in the user permission bit s, T and so on, this specific meaning can refer to "Linux Special permissions: SUID, SGID, Sbit". Make a simple summary below
- When a file has suid permissions, the user's file owner's permissions are executed during the execution of this binary program on behalf of the user
- When a directory has Sgid permissions, the new file user group that represents the user in this directory will be the same as the user group for that directory
- When the directory has sbit permissions, the newly created file on behalf of the user in this directory can be deleted only by itself and root
Atime & Noatime
The above mentioned Atime is the last access time of the file, I tested it myself, with cat access to the file, but the time stamp before and after the visit did not change, that is, access to the file did not change the atime. So I looked it up online for the following reasons:
Friends who are interested in performance and optimizing these keywords know that setting up Noatime can significantly improve file system performance when mounting file systems under Linux. By default, the Linux ext2/ext3 file system records some timestamp of the file when it is accessed, created, modified, and so on, such as the file creation time, the last modification time, and the last access time. Because the system is running to access a large number of files, if you can reduce some of the actions (such as reducing the number of time-stamp records, etc.) will significantly improve the efficiency of disk IO, improve the performance of the file system. Linux provides noatime This parameter to prohibit logging of the most recent access timestamp.
The wiki stat also has a description of the problem, called "Criticism_of_atime", which mentions the following options for updating the atime:
- Strictatime (formerly Atime, and formerly the default; Strictatime as of 2.6.30) –always update atime, which conforms to The behavior defined by POSIX
- Relatime ("Relative atime", introduced in 2.6.20 and the default as of 2.6.30) –only update atime under certain Circumsta Nces:if the previous atime is older than the mtime or CTime, or the previous atime are over-hours in the past
- Nodiratime–never Update atime of directories, but does update atime of other files
- Noatime–never update atime of any file or directory; implies Nodiratime; Highest performance, but least compatible
- Lazytime–update Atime According to specific circumstances laid out below
Where Noatime indicates that the Atime property of any file and directory is never updated. Configured on/etc/fstab. For example:
Hard links and Symbolic Links:
Hard Links: Multiple file entries refer to the same file (same inode)
Symbolic Link: The link file and the linked file description information (metadata, meta data) are independent, just say that the link file contains the full path of another file, to implement the file alias, similar to the "Shortcut" in Windows.
Linux uses the instruction ln src target to create a hard link, and target points to the same file as SRC, with the same inode (meta data) information. It can also be understood that just one more logical name points to a file. Execution instructions: ln Server.log server_hardlink.log
With the introduction of meta data above, you can see that the only thing that changes after creating a hard link is that the value of links becomes 2, because now Server_hardlink.log also points to the file. The advantage of hard links is the security of change, when we use RM to delete a file, in fact, just reduce the number of links in the file by 1, only when the number of links reduced to 0 after the actual deletion of files.
Linux uses the instruction ln-s src target to create symbolic links, as shown, SRC and target point to different inode
For hard links and symbolic links, the article "Linux soft links and hard links" is more detailed and clear.
Summarize:
This article records some basic knowledge about the file system, as well as the Linux file system related commands, the article is mainly the author is not quite clear before some of the knowledge points, not comprehensive, interested readers can look at "Bird's cousin private cuisine."
File system and Linux-related knowledge points