Source: http://www.ibm.com/developerworks/cn/linux/l-linux-filesystem/Basic File System architecture
The Linux file system architecture is an interesting example of abstracting complex systems. By using a common set of API functions, Linux can support many kinds of file systems on many kinds of storage devices. For example, areadfunction call can read a certain number of bytes from a specified file descriptor.readThe function does not understand the type of file system, such as ext3 or NFS. It also does not understand the storage media on which the file system resides, such as at Attachment Packet Interface (ATAPI) disks, serial-attached SCSI (SAS) disks, or Serial advanced technology Attachment (SATA) disks. However, whenreada file is read by the calling function, the data returns normally. This article explains the implementation of this mechanism and introduces the main structure of the Linux file system layer.
What is a file system?
First answer the most common question, "What is File system". A file system is a mechanism for organizing data and metadata on a storage device. Because the definition is so broad, the code that supports it can be interesting. As mentioned earlier, there are many kinds of file systems and media. Because there are so many types, it is expected that the Linux file system interface is implemented as a layered architecture, separating the user interface layer, the file system implementation, and the drivers that manipulate the storage device.
Mounting
The process of associating a file system with a storage device in Linux is called mounting (Mount). Use themountcommand to attach a file system to the current file system hierarchy (root). When performing a mount, provide the file system type, file system, and a hanging decoration.
To illustrate the functionality of the Linux file system layer (and the method of mounting), we create a file system in a file in the current file system. This is accomplished by firstddcreating a file of the specified size with a command (using/dev/zero as the source for file copying)-in other words, a file initialized with 0, as shown in Listing 1.
Listing 1. Create a file that has been initialized
DD If=/dev/zero of=file.img bs=1k count=1000010000+0 Records in10000+0 Records out$
There is now a 10MB file.img file. Use thelosetupcommand to associate a looping device with this file to make it look like a block device rather than a regular file in the file system:
losetup/dev/loop0 file.img$
This file now appears as a block device (represented by/DEV/LOOP0). Then usemke2fsthis device to create a file system. This command creates a new ext2 file system of the specified size, as shown in Listing 2.
Listing 2. Creating a Ext2 file system with a looping device
mke2fs-c/dev/loop0 10000mke2fs 1.35 (28-feb-2004) max_blocks 1024000, rsv_groups = 1250, Rsv_gdb = 39Filesystem Label=os type:linuxblock size=1024 (log=0) Fragment size=1024 (log=0) 2512 inodes, 10000 blocks500 blocks (5.00%) reserved For the Super user...$
Use the command to mount themount/dev/loop0file.img file represented by the loop device () to the hanging adornment/mnt/point1. Note that the file system type is specified asext2. After mounting, you can use this as a new file system, such as usinglscommands, as shown in Listing 3.
Listing 3. Create a hanging decoration and mount the file system through a looping device
Mkdir/mnt/point1 mount-t ext2/dev/loop0/mnt/point1 ls/mnt/point1lost+found$
As shown in Listing 4, you can continue with this process: Create a new file in the file system you just mounted, associate it with a looping device, and create another file system on it.
Listing 4. Create a new looping file system in the Loop file system
DD If=/dev/zero of=/mnt/point1/file.img bs=1k count=1000 losetup/dev/loop1/mnt/point1/file.img mke2fs-c/dev/loop1 Mkdir/mnt/point2 mount-t Ext2/dev/loop1/mnt/point2 Ls/mnt/point2 ls/mnt/point1file.img lost+found$
With this simple demonstration it is easy to see how powerful the Linux file system (and loop device) is. You can create an encrypted file system on a file with a looping device in the same way. You can use a looping device to temporarily mount files when you need them, which helps protect your data.
File System Architecture
Now that you have seen how the file system is constructed, look at the architecture of the Linux file system layer. This paper examines the Linux file system from two perspectives. The first step is to use the high-level architecture perspective. Then, we discuss the main structure of implementing the file system layer in depth.
High-level architecture
Although most file system code is in the kernel (except for the user-space file system discussed later), the architecture shown in Figure 1 shows the relationship between user space and the main components of the kernel that are related to the file system.
Figure 1. Architecture of Linux File system components
The user space contains applications (for example, the users of the file system) and the GNU C Library (GLIBC), which provide a user interface for file system calls (open, read, write, and close). The system invocation interface acts like a switch that sends system calls from the user space to the appropriate endpoints in the kernel space.
VFS is the primary interface for the underlying file system. This component exports a set of interfaces and then abstracts them into individual file systems, and the behavior of individual file systems can vary greatly. There are two caches (Inode and Dentry) for file system objects. They cache recently used file system objects.
Each file system implementation (such as ext2, JFS, and so on) exports a common set of interfaces for use by VFS. The buffer cache caches requests between the file system and related block devices. For example, read and write requests to the underlying device driver are passed through the buffer cache. This allows requests to be cached in, reducing the number of times a physical device is accessed and speeding up access. Manages the buffer cache in the form of a recently Used (LRU) list. Note that you can usesynccommands to send requests in the buffer cache to the storage media (forcing all the uncommitted data to be sent to the device driver, which is then sent to the storage device).
This is the top-level scenario for VFS and file system components. Now, discuss the main structure of implementing this subsystem.
Main structure
Linux treats all file systems in the perspective of a common set of objects. These objects are super blocks (superblock), Inode, Dentry, and files. The super Block describes and maintains the state of the file system on the root of each filesystem. Each object (file or directory) that is managed in the file system is represented as an inode in Linux. The inode contains all the metadata that is required to manage the objects in the file system, including the operations that can be performed on the object. Another set of structures, called dentry, is used to implement mappings between names and Inode, and a directory cache is used to hold the most recently used Dentry. Dentry also maintains the relationship between directories and files, which supports moving through the file system. Finally, the VFS file represents an open file (the state of the open file is saved, such as the write offset, and so on).
Virtual file system Layer
The VFS acts as the root layer of the file system interface. The VFS records the currently supported file systems and the currently mounted file systems.
You can use a set of registration functions to dynamically add or remove file systems in Linux. The kernel saves a list of currently supported file systems, which can be viewed in user space through the/proc file system. This virtual file also displays the devices that are currently associated with these file systems. The method of adding a new file system to Linux is calledregister_filesystem. The parameter of this function defines a reference to a file system structure () that defines the name of thefile_system_typefile system, a set of properties, and two super-block functions. You can also unregister the file system.
When registering a new file system, the file system and its related information are added to the File_systems list (see Figures 2 and linux/include/linux/mount.h). This list defines the file systems that can be supported. You can view this list by typing it on the command linecat /proc/filesystems.
Figure 2. File systems registered to the kernel
Another structure maintained in the VFS is the mounted file system (see Figure 3). This structure provides the currently mounted file system (see LINUX/INCLUDE/LINUX/FS.H). It links the Super block structure discussed below.
Figure 3. List of mounted file systems
Super Block
The super block structure represents a file system. It contains the information required to manage the file system, including file system names (such as ext2), file system size and status, block device references and metadata information (such as free lists, and so on). A super block is typically stored on a storage medium, but it can be created in real time if the Super block does not exist. You can find the Super block structure in the./linux/include/linux/fs.h (see Figure 4).
Figure 4. Super block structure and inode operations
One important element in the super block is the definition of the Super block operation. This structure defines a set of functions that are used to manage the inode in this file system. For example, you can usealloc_inodethe inode allocation todestroy_inodedelete the inode. The inode can be usedread_inodeandwrite_inoderead and written and synchronized with thesync_fsfile system. Structures can be found in./linux/include/linux/fs.h.super_operationsEach file system provides its own Inode method, which implements the operation and provides a common abstraction to the VFS layer.
Inode and Dentry
The inode represents an object in the file system that has a unique identifier. Each file system provides a way to map file names to unique inode identifiers and Inode references. Figure 5 shows a portion of the inode structure and two related structures. Please pay special attentioninode_operationsandfile_operations. These structures represent the operations that can be performed on this inode.inode_operationsdefines the operations that are performed directly on the Inode, andfile_operationsdefines the methods associated with files and directories (standard system calls).
Figure 5. Inode structure and associated operations
The inode and the directory cache each hold the most recently used Inode and Dentry. Note that for each inode in the inode cache, there is a corresponding dentry in the directory cache. Can be foundinodeand structured in./linux/include/linux/fs.h.dentry
Buffer cache
In addition to the individual file system implementations (which can be found in./linux/fs), the bottom of the file system layer is the buffer cache. This component tracks read and write requests from the file system implementation and physical devices (through device drivers). To improve efficiency, Linux caches requests to avoid sending all requests to physical devices. The most recently used buffers (pages) are cached in the cache, which can be quickly provided to individual file systems.
Interesting file system
This article does not discuss the specific file systems available in Linux, but it is worth mentioning a little bit here. Linux supports a number of file systems, including MINIX, MS-DOS, and ext2, for older file systems. Linux also supports new journaled file systems such as Ext3, JFS, and ReiserFS. In addition, Linux supports encrypting file systems (such as CFS) and virtual file systems (such as/PROC).
The last notable file system is Filesystem in userspace (FUSE). This file system can send file system requests back to user space through the VFS. So, if you're interested in creating your own file system, it's a good idea to use FUSE for development.
Conclusion
Although the implementation of a file system is not complex, it is a good example of a scalable and extensible architecture. The file system architecture has evolved for many years and has successfully supported many different types of file systems and many target storage devices. With the use of plug-in-based architectures and multi-layered function indirection, the recent development of Linux file systems is noteworthy.