File System installation prerequisites

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Like every traditional UNIX system, Linux also uses the system root filesystem: it is directly installed by the kernel in the boot phase, it also has the system initialization script and the most basic system program.

Other file systems are either installed by the initialization script or directly installed by the user in the directory where the file system has been installed. As a directory tree, each file system has its own root directory ). The directory for installing the file system is called the mount point ). The installed file system is a sub-File System in the installation point directory. For example, the/proc Virtual File System is the child of the system's root file system (and the root file system of the system is the father of/proc ). The root directory of the installed file system hides the original content of the Installation Point directory of the parent file system, and the entire subtree of the parent file system is located under the Installation Point.

The root directory of the file system may be different from the root directory of the process: the root directory of the process is the directory corresponding to the "/" path. By default, the root directory of the process is the same as the root directory of the system's root file system (more accurately, it is consistent with the root directory of the root file system in the process namespace, this is important. We will discuss it below), but you can change the root directory of the process by calling the chroot () System Call.

1 namespace

In traditional Unix systems, there is only one installed file system tree: Starting from the root file system of the system, each process can access any file in the installed file system by specifying a proper path name. In this regard, Linux 2.6 is more accurate: each process can have its own installed file system tree called the namespace of the process ).

Generally, there is only one namespace for the entire system, which is shared by most processes, that is, the installed file system tree that is located in the root file system of the system and used by the INIT process. However, if the clone () System Call uses the clone_newns flag to create a new process, the process obtains a new namespace. In other words, if the parent process does not use the clone_newns flag to create these child processes, the namespace will be inherited by subsequent child processes.

When a process installs or detaches a file system, it only modifies its namespace. Therefore, the modifications are visible to all processes that share the same namespace, and only visible to them. A process can even change the root file system of its namespace by calling the Linux-specific cmdt_root () system.

The namespace structure of the process namespace pointed to by the namespace field of the Process descriptor:
Struct namespace {
Atomic_t count;/* Reference Counter (number of processes in the shared namespace )*/
Struct vfsmount * root;/* installed file system descriptor of the root directory of the namespace */
Struct list_head list;/* all headers of the List of mounted file system Descriptors (vfsmount */
Wait_queue_head_t poll;/* namespace waiting queue */
Int event;/* event */
};

The list field is the header of a two-way circular linked list, which aggregates all installed file systems in the namespace. The root field indicates that the file system has been installed. It is the root of the installed file system tree in the namespace. Next we will see that the installed file system is described by the vfsmount structure.

2. Data Structure of File System Installation

In most traditional Unix-like kernels, each file system can be installed only once. Assume that the ext2 file system stored on the/dev/fd0 disk is installed in the directory/partition by running the following command:
Mount-T ext2/dev/fd0/second

Before uninstalling the file system using the umount command, all other installation commands for/dev/fd0 will fail. However, Linux is different: it is possible to install the same file system multiple times. Of course, if a file system is installed n times, its root directory can be accessed through N installation points. Although the same file system can be accessed through different installation points, the file system is indeed unique. Therefore, no matter how many times a file system is installed, there is only one super block object.

Then, the installed file system forms a level: the installation point of a file system may become the directory of the Second file system, and the second file system is installed on the third file system, and so on.

It is also possible to Stack multiple installations on a single Installation Point. Although the previously installed files and directories can be used by the process, the new installation on the same installation point hides the previous installed file system. When the top-level installation is deleted, the installation at the next layer becomes visible again.

As you can imagine, tracking installed file systems will soon become a nightmare. For each installation operation, the kernel must store the installation points and installation marks in the memory, and the relationship between the file system to be installed and other installed file systems. Therefore, we need a data structure that retains the installation information to clarify these relationships. Such information is stored in the installed file system descriptor. Each descriptor is a data structure with the vfsmount type:

Struct vfsmount {
Struct list_head mnt_hash;/* pointer used for hash list linked list */
Struct vfsmount * mnt_parent;/* points to the parent file system, which is installed on it */
Struct dentry * mnt_mountpoint;/* points to the dentry */
Struct dentry * mnt_root;/* dentry pointing to the root directory of the file system */
Struct super_block * mnt_sb;/* point to the super block object of this file system */
Struct list_head mnt_mounts;/* header containing the linked list of all file system Descriptors (relative to this file system )*/
Struct list_head mnt_child;/* pointer used to install the mnt_mounts linked list of the file system */
Atomic_t mnt_count;/* Reference Counter (add this value to prevent the file system from being uninstalled )*/
Int mnt_flags;/* flag */
Int mnt_expiry_mark;/* If the file system is marked as expired, set this parameter to true.
* (If this flag is set and no one uses it,
* The file system can be automatically uninstalled )*/
Char * mnt_devname;/* device file name, for example,/dev/DSK/hda1 */
Struct list_head mnt_list;/* pointer to the namespace linked list of the file system descriptor installed */
Struct list_head mnt_expire;/* pointer to the expired linked list of the file system */
Struct list_head mnt_share;/* Circular List of shared mounts */
Struct list_head mnt_slave_list;/* List of slave mounts */
Struct list_head mnt_slave;/* slave List entry */
Struct vfsmount * mnt_master;/* slave is on Master-> mnt_slave_list */
Struct namespace * mnt_namespace;/* pointer to the process namespace where the file system is installed */
Int mnt_pinned;
};

The vfsmount data structure is stored in several two-way cyclic linked lists:

-The address of the vfsmount descriptor of the parent file system and the address index of the Directory item object in the mounted directory are hashed. The hash is stored in the mount_hashtable array. Like inode and dentry, the hash size of vfsmount depends on the ram capacity in the system. Each item in the table is the header of a bidirectional cyclic linked list formed by all descriptors with the same hash value. The mnt_hash field of the descriptor contains pointers to adjacent elements in the linked list.

-For each namespace, all installed file system descriptors belonging to the namespace form a two-way circular linked list. The list field in the namespace structure stores the head of the linked list. The mnt_list field in the vfsmount descriptor contains pointers to adjacent elements in the linked list.

-For each installed file system, all installed sub-file systems form a two-way circular linked list. The headers of each linked list are stored in the mnt_mounts field of the installed file system descriptor. In addition, the mnt_child field of the descriptor is stored as pointers to adjacent elements in the linked list.

Vfsmount_lock spin lock protects linked lists of installed file system objects from simultaneous access.

The mnt_flags field of the descriptor stores several flag values to specify how to process certain types of files in the installed file system. These flags can be set using the options of the mount command. The flags are as follows:
Mnt_nosuid: Disable setuid and setgid in the installed File System
Mnt_nodev: Prohibit Access to device files in the installed File System
Mnt_noexec: program execution is not allowed in the installed File System

Next, we will introduce several common functions for processing installed file system descriptors.

(1) Allocate and initialize an installed file system descriptor.

Struct vfsmount * alloc_vfsmnt (const char * name)
{
Struct vfsmount * mnt = kmem_cache_alloc (mnt_cache, gfp_kernel );
If (mnt ){
Memset (MNT, 0, sizeof (struct vfsmount ));
Atomic_set (& MNT-> mnt_count, 1 );
Init_list_head (& MNT-> mnt_hash );
Init_list_head (& MNT-> mnt_child );
Init_list_head (& MNT-> mnt_mounts );
Init_list_head (& MNT-> mnt_list );
Init_list_head (& MNT-> mnt_expire );
Init_list_head (& MNT-> mnt_share );
Init_list_head (& MNT-> mnt_slave_list );
Init_list_head (& MNT-> mnt_slave );
If (name ){
Int size = strlen (name) + 1;
Char * newname = kmalloc (size, gfp_kernel );
If (newname ){
Memcpy (newname, name, size );
MNT-> mnt_devname = newname;
}
}
}
Return MNT;
}

(2) release the installed file system descriptor directed by MNT.

Void free_vfsmnt (struct vfsmount * MNT)
{
Kfree (MNT-> mnt_devname );
Kmem_cache_free (mnt_cache, MNT );
}

(3) Search for a descriptor in the hash and return its address (the MNT parameter indicates an installed file system, dentry indicates the Installation Point of the sub-file system installed in the file system, and the function returns vfsmount of the sub-File System ).

Struct vfsmount * lookup_mnt (struct vfsmount * MNT, struct dentry * dentry)
{
Struct vfsmount * child_mnt;
Spin_lock (& vfsmount_lock );
If (child_mnt = _ lookup_mnt (MNT, dentry, 1 )))
Mntget (child_mnt );
Spin_unlock (& vfsmount_lock );
Return child_mnt;
}

Struct vfsmount * _ lookup_mnt (struct vfsmount * MNT, struct dentry * dentry,
Int DIR)
{
Struct list_head * head = mount_hashtable + Hash (MNT, dentry );
Struct list_head * TMP = head;
Struct vfsmount * P, * found = NULL;

For (;;){
TMP = dir? TMP-> next: TMP-> Prev;
P = NULL;
If (TMP = head)
Break;
P = list_entry (TMP, struct vfsmount, mnt_hash );
If (p-> mnt_parent = mnt & P-> mnt_mountpoint = dentry ){
Found = P;
Break;
}
}
Return found;
}

Static inline struct vfsmount * mntget (struct vfsmount * MNT)
{
If (mnt)
Atomic_inc (& MNT-> mnt_count );
Return MNT;
}

3. Copy Kernel Parameters

After clarifying the preceding data structures, we can now discuss the operations to be performed by the kernel when installing a file system. First, consider the situation where a file system will be installed on an installed File System (Here we regard this new file system as "normal ").

A mount () system call is used to install a common file system. Its service routine sys_mount () Acts on the following parameters:
-The Path Name of the device file where the file system is located, or null if not required (for example, when the file system to be installed is based on the network)
-Path Name of a directory on which the file system is installed (Installation Point)
-The file system type must be the name of the registered File System (see the previous blog)
-Installation labels:

Ms_rdonly: the file can only be read
Ms_nosuid: Disable setuid and setgid
Ms_nodev: Disable access to Device Files
Ms_noexec: program execution not allowed
Ms_synchronous: write operations on files and directories are real-time
Ms_remount: reinstall the file system that changed the installation flag
Ms_mandlock: Force lock allowed
Ms_dirsync: write operations on the directory are real-time
Ms_noatime: Do not update the file access time
Ms_nodiratime: Do not update the directory access time
Ms_bind: create a "bind installation", which allows a file or directory to be visible on another point in the system directory tree (the _ bind option of the mount command)
Ms_move: automatically move an installed file system to another installation point (the _ move option of the mount command)
Ms_rec: recursively creates "binding and installation" for the Directory subtree"
Ms_verbose: Kernel messages are generated when an installation error occurs.

-Pointer to a data structure related to the file system (maybe null)

Asmlinkage long sys_mount (char _ User * dev_name, char _ User * dir_name,
Char _ User * type, unsigned long flags,
Void _ User * Data)
{
Int retval;
Unsigned long data_page;
Unsigned long type_page;
Unsigned long dev_page;
Char * dir_page;

Retval = copy_mount_options (type, & type_page );
If (retval <0)
Return retval;

Dir_page = getname (dir_name );
Retval = ptr_err (dir_page );
If (is_err (dir_page ))
Goto out1;

Retval = copy_mount_options (dev_name, & dev_page );
If (retval <0)
Goto out2;

Retval = copy_mount_options (data, & data_page );
If (retval <0)
Goto out3;

Lock_kernel ();
Retval = do_mount (char *) dev_page, dir_page, (char *) type_page,
Flags, (void *) data_page );
Unlock_kernel ();
Free_page (data_page );

Out3:
Free_page (dev_page );
Out2:
Putname (dir_page );
Out1:
Free_page (type_page );
Return retval;
}

The sys_mount () function copies the parameter values to the temporary kernel buffer, that is, the local variables of our function located on the kernel stack. Let's talk about how to copy data. Note that many System Call functions have this step. Here we will only talk about this step once. In the future, please give an analogy.

First, retval = copy_mount_options (type, & type_page). This function copies a data pointing to the user State data zone to the start of the Unit pointed to by the kernel stack where, and returns-enomem:
Int copy_mount_options (const void _ User * data, unsigned long * Where)
{
Int I;
Unsigned long page;
Unsigned long size;

* Where = 0;
If (! Data)
Return 0;

If (! (Page = _ get_free_page (gfp_kernel )))
Return-enomem;

/* Task_size = page_offset
* Page_offset = 0xc0000000
* Page_size = 12
*/
/* Copy_from_user cannot cross task_size! */
Size = task_size-(unsigned long) data;
If (size> page_size)
Size = page_size;

I = size-exact_copy_from_user (void *) page, Data, size );
If (! I ){
Free_page (PAGE );
Return-efault;
}
If (I! = Page_size)
Memset (char *) page + I, 0, page_size-I );
* Where = page;
Return 0;
}

The function first obtains a page whose first address is page. Then, check whether the data address is in the user State, that is, whether the linear address generated by the data address is less than 0xc0000000. If yes, task_size-data must be greater than 0. Then, make a page_size judgment to ensure that it will not fall into task_size. Next, call exact_copy_from_user (page, Data, size) to copy the size byte from the address indicated by data to the page:
Static long exact_copy_from_user (void * To, const void _ User * from,
Unsigned long N)
{
Char * t =;
Const char _ User * f = from;
Char C;

If (! Access_ OK (verify_read, from, n ))
Return N;

While (n ){
If (_ get_user (C, F )){
Memset (T, 0, N );
Break;
}
* T ++ = C;
F ++;
N --;
}
Return N;
}

Exact_copy_from_user first performs a parameter transfer verification, which is achieved through the access_ OK macro:
# Define access_ OK (type, ADDR, size) (likely (_ range_ OK (ADDR, size) = 0 ))
# DEFINE _ range_ OK (ADDR, size )({/
Unsigned long flag, sum ;/
_ Chk_user_ptr (ADDR );/
ASM ("addl % 3, % 1; sbbl % 0, % 0; CMPL % 1, % 4; sbbl $0, % 0 "/
: "= & R" (FLAG), "= r" (SUM )/
: "1" (ADDR), "G" (INT) (size), "RM" (current_thread_info ()-> addr_limit.seg ));/
Flag ;})

_ Chk_user_ptr is an empty function in the 80x86 system. The access_ OK macro checks the address range between ADDR and ADDR + size-1. The function first verifies ADDR + size (sum internal variable, whether the maximum address to be checked is greater than 2 ^ 32-1; this is because GCC uses 32 digits to represent unsigned long integers and pointers (long ), this is equivalent to checking overflow conditions. The function also checks whether ADDR + size exceeds the value stored in the addr_limit.seg field of the thread_info structure of the current check. Generally, the value of addr_limit.seg is page_offset, that is, 0xc0000000, and the value of the kernel thread is 0 xffffffff.

Return to exact_copy_from_user, use the _ get_user macro to verify whether the local temporary variable C is in the inner kernel, and copy the content of F to it:
# DEFINE _ get_user (x, PTR )/
_ Get_user_nocheck (x), (PTR), sizeof (* (PTR )))
# DEFINE _ get_user_nocheck (x, PTR, size )/
({/
Long _ gu_err ;/
Unsigned long _ gu_val ;/
_ Get_user_size (_ gu_val, (PTR), (size) ,__ gu_err,-efault );/
(X) = (_ typeof _ (* (PTR) _ gu_val ;/
_ Gu_err ;/
})
# DEFINE _ get_user_size (x, PTR, size, retval, errret )/
Do {/
Retval = 0 ;/
_ Chk_user_ptr (PTR );/
Switch (size ){/
Case 1: _ get_user_asm (x, PTR, retval, "B", "B", "= Q", errret); break; // * Copy a byte */
Case 2: _ get_user_asm (x, PTR, retval, "W", "W", "= r", errret); break; // * Copy a word */
Case 4: _ get_user_asm (x, PTR, retval, "L", "", "= r", errret); break; // * Copy a double character */
Default: (x) = _ get_user_bad ();/
}/
} While (0)
# DEFINE _ get_user_asm (x, ADDR, err, itype, Rtype, ltype, errret )/
_ ASM _ volatile __(/
"1: mov" itype "% 2, %" Rtype "1/N "/
"2:/N "/
". Section. fixup,/" ax/"/N "/
"3: movl % 3, % 0/N "/
"XOR" itype "%" Rtype "1, %" Rtype "1/N "/
"JMP 2B/N "/
". Previous/N "/
". Section _ ex_table,/" A/"/N "/
". Align 4/N "/
". Long 1B, 3B/N "/
". Previous "/
: "= R" (ERR), ltype (X )/
: "M" (_ m (ADDR), "I" (errret), "0" (ERR ))

Then copy it to the location indicated by the parameter:
* T ++ = C;
F ++;
N --;

Return to sys_mount. Next, copy the installation directory name to the temporary variable of the dir_page kernel. The getname function is also a very important function, because it is used a lot when copying file names:
Char * getname (const char _ User * filename)
{
Char * TMP, * result;

Result = err_ptr (-enomem );
TMP = _ getname ();
If (TMP ){
Int retval = do_getname (filename, TMP );

Result = TMP;
If (retval <0 ){
_ Putname (TMP );
Result = err_ptr (retval );
}
}
Audit_getname (result );
Return result;
}
# DEFINE _ getname () kmem_cache_alloc (names_cachohhot, slab_kernel)
Static int do_getname (const char _ User * filename, char * Page)
{
Int retval;
Unsigned long Len = path_max;

If (! Segment_eq (get_fs (), kernel_ds )){
If (unsigned long) FILENAME> = task_size)
Return-efault;
If (task_size-(unsigned long) filename <path_max)
Len = task_size-(unsigned long) filename;
}

Retval = strncpy_from_user (page, filename, Len );
If (retval> 0 ){
If (retval <Len)
Return 0;
Return-enamlong long;
} Else if (! Retval)
Retval =-enoent;
Return retval;
}

The getname function calls do_getname to complete the Object Name copy practice. The do_getname function is mainly strncpy_from_user:
Long strncpy_from_user (char * DST, const char _ User * SRC, Long Count)
{
Long res =-efault;
If (access_ OK (verify_read, SRC, 1 ))
_ Do_strncpy_from_user (DST, SRC, Count, Res );
Return res;
}
# DEFINE _ do_strncpy_from_user (DST, SRC, Count, Res )/
Do {/
Int _ D0, _ D1, _ D2 ;/
Might_sleep ();/
_ ASM _ volatile __(/
"Testl % 1, % 1/N "/
"JZ 2f/N "/
"0: lodsb/N "/
"Stosb/N "/
"Testb % Al, % Al/N "/
"JZ 1f/N "/
"Decl % 1/N "/
"Jnz 0b/N "/
"1: subl % 1, % 0/N "/
"2:/N "/
". Section. fixup,/" ax/"/N "/
"3: movl % 5, % 0/N "/
"JMP 2B/N "/
". Previous/N "/
". Section _ ex_table,/" A/"/N "/
". Align 4/N "/
". Long 0b, 3B/N "/
". Previous "/
: "= D" (RES), "= C" (count), "= & A" (_ D0), "= & S" (_ D1 ),/
"= & D" (_ D2 )/
: "I" (-efault), "0" (count), "1" (count), "3" (SRC), "4" (DST )/
: "Memory ");/
} While (0)

Copy_mount_options (dev_name, & dev_page) and copy_mount_options (data, & data_page) after the sys_mount function are well understood. Then, obtain the large kernel lock lock_kernel () and call the do_mount () function. Once do_mount () is returned, this service routine releases the large kernel lock and releases the temporary kernel buffer.

To learn how to install the do_mount () function, please refer to the blog ......

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

File System installation prerequisites

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

File System installation prerequisites

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support