Path search is a major operation of VFS: To get the inode of a file name. Path search is a tedious part of VFS, mainly including symbolic links, file system mount points, and other strange paths such as... And // introduce complexity.
Nameidata data Data Structure
The search process involves many function calls. During these calls, nameidata plays an important role: 1. Passing parameters to the search function; 2. Saving the search result.
struct nameidata { struct dentry *dentry; struct vfsmount *mnt; struct qstr last; unsigned int flags; int last_type; unsigned depth; char *saved_names[MAX_NESTED_LINKS + 1]; /* Intent data */ union { struct open_intent open; } intent;};
After the search is complete, @ dentry contains the dentry directory item of the found file; @ MNT contains the vfsmount where the file directory item is located
@ Last contains the name to be searched. This is a fast string. Besides the path string, it also contains the length of the string and a hash value.
@ Depth current path depth.
@ Saved_names: because the name of the Nd has been changing during symbolic link processing, it is used to save the path name in symbolic link processing.
There are many functions to search for the kernel path. Here we use path_lookup as the column, kernel version 2.6.24
Path_lookup
int fastcall path_lookup(const char *name, unsigned int flags, struct nameidata *nd){ return do_path_lookup(AT_FDCWD, name, flags, nd);}
Given the three parameters @ name is the file path name (which can be a full or relative path name); @ flags path lookup tag; @ nd does not contain any useful information at this time, returns the search result.
Do_path_lookup
1119 static int fastcall do_path_lookup(int dfd, const char *name,1120 unsigned int flags, struct nameidata *nd)1121 {1122 int retval = 0;1123 int fput_needed;1124 struct file *file;1125 struct fs_struct *fs = current->fs;1126 1127 nd->last_type = LAST_ROOT; /* if there are only slashes... */1128 nd->flags = flags;1129 nd->depth = 0;1130 1131 if (*name=='/') {1132 read_lock(&fs->lock);1133 if (fs->altroot && !(nd->flags & LOOKUP_NOALT)) {1134 nd->mnt = mntget(fs->altrootmnt);1135 nd->dentry = dget(fs->altroot);1136 read_unlock(&fs->lock);1137 if (__emul_lookup_dentry(name,nd))1138 goto out; /* found in altroot */1139 read_lock(&fs->lock);1140 }1141 nd->mnt = mntget(fs->rootmnt);1142 nd->dentry = dget(fs->root);1143 read_unlock(&fs->lock);1144 } else if (dfd == AT_FDCWD) {1145 read_lock(&fs->lock);1146 nd->mnt = mntget(fs->pwdmnt);1147 nd->dentry = dget(fs->pwd);1148 read_unlock(&fs->lock);1149 } else {1150 struct dentry *dentry;1151 1152 file = fget_light(dfd, &fput_needed);1153 retval = -EBADF;1154 if (!file)1155 goto out_fail;1157 dentry = file->f_path.dentry;1158 1159 retval = -ENOTDIR;1160 if (!S_ISDIR(dentry->d_inode->i_mode))1161 goto fput_fail;1162 1163 retval = file_permission(file, MAY_EXEC);1164 if (retval)1165 goto fput_fail;1166 1167 nd->mnt = mntget(file->f_path.mnt);1168 nd->dentry = dget(dentry);1169 1170 fput_light(file, fput_needed);1171 }1172 1173 retval = path_walk(name, nd);1174 out:1175 if (unlikely(!retval && !audit_dummy_context() && nd->dentry &&1176 nd->dentry->d_inode))1177 audit_inode(name, nd->dentry);1178 out_fail:1179 return retval;1180 1181 fput_fail:1182 fput_light(file, fput_needed);1183 goto out_fail;1184 }
This function is a bit long, but the logic is clear, that is, to call the path_walk preparation, when entering the do_path_lookup function, the parameter @ nd does not contain any useful information, but when calling the path_walk, @ nd contains the start point information.
Therefore, from 1127 ~ Row 3 is the process of preparing to search for the starting point. There are three situations:
1. 1131 ~ 1143 the file name contains an absolute path. Therefore, the root directory of the file system is used as the search start point.
2. 1144 ~ 1148 the path is not an absolute path. We specify to start searching from the current directory.
3. 1149 ~ 1171 the first parameter @ DFD of the function is a directory file descriptor. We can start searching for this directory.
1173 everything is ready. Call path_walk to start searching.
Path_walk
1042 static int fastcall path_walk(const char * name, struct nameidata *nd)1043 {1044 current->total_link_count = 0;1045 return link_path_walk(name, nd);1046 }
Symbolic Links require special processing. Generally, we track links. If there is no symbolic link, the file system must be a perfect tree structure, symbolic Links make the tree not so perfect, and sometimes lead to loops in the tree. Therefore, when querying the path, there is a maximum limit on the number of followed-up symbolic links. The hard encoding for 2.6.24 is 40.
1044 before starting a new search, we initialize it to 0.
Link_path_walk
1018 static int fastcall link_path_walk(const char *name, struct nameidata *nd)1019 {1020 struct nameidata save = *nd;1021 int result;1022 1023 /* make sure the stuff we saved doesn't go away */1024 dget(save.dentry);1025 mntget(save.mnt);1026 1027 result = __link_path_walk(name, nd);1028 if (result == -ESTALE) {1029 *nd = save;1030 dget(nd->dentry);1031 mntget(nd->mnt);1032 nd->flags |= LOOKUP_REVAL;1033 result = __link_path_walk(name, nd);1034 }1035 1036 dput(save.dentry);1037 mntput(save.mnt);1038 1039 return result;1040 }
This function looks a bit annoying, mainly because some functions will return an estale error. In this case, you need to re-execute the path search and do not use dcache.
We do not care about this special situation here, so we only need to consider _ link_path_walk
_ Link_path_walk
This function is close to 200 rows, which is beyond the processing capacity of my brain space.
826 static fastcall int __link_path_walk(const char * name, struct nameidata *nd) 827 { 828 struct path next; 829 struct inode *inode; 830 int err; 831 unsigned int lookup_flags = nd->flags; 832 833 while (*name=='/') 834 name++; 835 if (!*name) 836 goto return_reval; 837 838 inode = nd->dentry->d_inode; 839 if (nd->depth) 840 lookup_flags = LOOKUP_FOLLOW | (nd->flags & LOOKUP_CONTINUE);
833 ~ 836 first process/before the path name /,
The next step is a large loop where Path Components are processed one by one. @ Name is decomposed into various Path Components in a loop. Each component represents a path name. The Code flowchart is provided.
Permission check
848 nd->flags |= LOOKUP_CONTINUE; 849 err = exec_permission_lite(inode, nd); 850 if (err == -EAGAIN) 851 err = vfs_permission(nd, MAY_EXEC); 852 if (err) 853 break;
Calculate the path component hash
855 this.name = name; 856 c = *(const unsigned char *)name; 857 858 hash = init_name_hash(); 859 do { 860 name++; 861 hash = partial_name_hash(c, hash); 862 c = *(const unsigned char *)name; 863 } while (c && (c != '/')); 864 this.len = name - (const char *) this.name; 865 this.hash = end_name_hash(hash);
Processing ..
874 /* 875 * "." and ".." are special - ".." especially so because it has 876 * to be able to know about the current root directory and 877 * parent relationships. 878 */ 879 if (this.name[0] == '.') switch (this.len) { 880 default: 881 break; 882 case 2: 883 if (this.name[1] != '.') 884 break; 885 follow_dotdot(nd); 886 inode = nd->dentry->d_inode; 887 /* fallthrough */ 888 case 1: 889 continue; 890 }
"." Indicates the current path, so we only need to skip this path component and start to process the next path component.
"..." Indicates returning to the parent directory. You can call follow_dotdot. follow_dotdot is not as simple as it looks, because you need to consider the existence of the Installation Point.
Re-calculate hash
891 /* 892 * See if the low-level filesystem might want 893 * to use its own hash.. 894 */ 895 if (nd->dentry->d_op && nd->dentry->d_op->d_hash) { 896 err = nd->dentry->d_op->d_hash(nd->dentry, &this); 897 if (err < 0) 898 break; 899 }
Some file systems have their own hash computing functions. For example, the fat file system is not case sensitive, so the hash function needs to be changed accordingly.
Do_lookup
900 /* This does the actual lookups.. */ 901 err = do_lookup(nd, &this, &next); 902 if (err) 903 break;
This function will be introduced later
Process Symbolic Links
913 if (inode->i_op->follow_link) { 914 err = do_follow_link(&next, nd); 915 if (err) 916 goto return_err; 917 err = -ENOENT; 918 inode = nd->dentry->d_inode; 919 if (!inode) 920 break; 921 err = -ENOTDIR; 922 if (!inode->i_op) 923 break; 924 } else 925 path_to_nameidata(&next, nd);
If inode-> I _op-> follow_link is not empty, the inode file is a symbolic link file. Otherwise, it must be empty.
Do_follow_link is used to process symbolic links.
924 ~ 925 is not a symbolic link, so the result of next is assigned to the return value @ nd
Do_lookup
779 /* 780 * It's more convoluted than I'd like it to be, but... it's still fairly 781 * small and for now I'd prefer to have fast path as straight as possible. 782 * It _is_ time-critical. 783 */ 784 static int do_lookup(struct nameidata *nd, struct qstr *name, 785 struct path *path) 786 { 787 struct vfsmount *mnt = nd->mnt; 788 struct dentry *dentry = __d_lookup(nd->dentry, name); 789 790 if (!dentry) 791 goto need_lookup; 792 if (dentry->d_op && dentry->d_op->d_revalidate) 793 goto need_revalidate; 794 done: 795 path->mnt = mnt; 796 path->dentry = dentry; 797 __follow_mount(path); 798 return 0; 799 800 need_lookup: 801 dentry = real_lookup(nd->dentry, name, nd); 802 if (IS_ERR(dentry)) 803 goto fail; 804 goto done; 805 806 need_revalidate: 807 dentry = do_revalidate(dentry, nd); 808 if (!dentry) 809 goto need_lookup; 810 if (IS_ERR(dentry)) 811 goto fail; 812 goto done; 813 814 fail: 815 return PTR_ERR(dentry); 816 }
@ Nd is the input parameter. This structure specifies the parent directory and the vfsmount
@ Name: Specifies the path component name.
@ Path: output parameter. Save the search result.
788 in the dentry cache, query by the parent dentry and path component names. If yes, assign a value to path in the done, where __follow_mount processes the loading points.
790 if it is not found in the dentry cache, it is necessary to call the lookup of the underlying File System for Search. real_lookup will call the underlying lookup function.
792 if dentry-> d_op-> d_revalidate exists, the directory items in dentry cache are not necessarily up-to-date. VFS does not implement this function, but provides this interface for the underlying file system, for example, the NFS file system may cause local dentry cache and remote file content not to be synchronized. We do not care about this situation.
_ Follow_mount
Follow Mount has two versions: __follow_mount and follow_mount. The difference is not big. Analyze one and the other.
689 /* no need for dcache_lock, as serialization is taken care in 690 * namespace.c 691 */ 692 static int __follow_mount(struct path *path) 693 { 694 int res = 0; 695 while (d_mountpoint(path->dentry)) { 696 struct vfsmount *mounted = lookup_mnt(path->mnt, path->dentry); 697 if (!mounted) 698 break; 699 dput(path->dentry); 700 if (res) 701 mntput(path->mnt); 702 path->mnt = mounted; 703 path->dentry = dget(mounted->mnt_root); 704 res = 1; 705 } 706 return res; 707 }
This function code is very simple and complicated because of its implicit concepts. But it's hard to tell clearly. It's just a bit boring.
We know that there may be an installation point in the lookup path. For example:
/Mnt/sdcard/sd1 is a path with a file file1 file2 in the SD card. We mount an SD card to/mnt/sdcard/sd1.
In this case, our lookup Queries/mnt/sdcard/sd1/file1. When we find/mnt/sdcard/sd1, what we get is the root file system's vfsmount and/mnt/sdcard/sd1 dentry, but if we want to continue to find file1, We must convert it
To continue searching for file1.
OK, _ follow_mount is the task. The cycle is because other devices have mounted to/mnt/sdcard/sd1 before sd1 is mounted to/mnt/sdcard/sd1.
From here, we can see that vfsmount is essential, because dentry cannot uniquely determine a directory item, which must be determined by both vfsmount and dentry.
Do_follow_link
Symlink and hardlink introduce a lot of complexity to the file system. hardlink has no effect on file path searching, and symlink introduces some troubles.
637 /* 638 * This limits recursive symlink follows to 8, while 639 * limiting consecutive symlinks to 40. 640 * 641 * Without that kind of total limit, nasty chains of consecutive 642 * symlinks can cause almost arbitrarily long lookups. 643 */ 644 static inline int do_follow_link(struct path *path, struct nameidata *nd) 645 { 646 int err = -ELOOP; 647 if (current->link_count >= MAX_NESTED_LINKS) 648 goto loop; 649 if (current->total_link_count >= 40) 650 goto loop; 651 BUG_ON(nd->depth >= MAX_NESTED_LINKS); 652 cond_resched(); 653 err = security_inode_follow_link(path->dentry, nd); 654 if (err) 655 goto loop; 656 current->link_count++; 657 current->total_link_count++; 658 nd->depth++; 659 err = __do_follow_link(path, nd); 660 current->link_count--; 661 nd->depth--; 662 return err; 663 loop: 664 dput_path(path, nd); 665 path_release(nd); 666 return err; 667 }
In Path search, the symlink file itself is not the directory to be searched for. The search target is the file path it represents.
In order to prevent endless paths and the cases that are very nasty (the term nasty can only be said), Linux allows a maximum of 8 recursion times and a maximum of 40 connections.
The occurrence of recursion is 656 rows. It is also possible to call do_follow_link. Therefore, if the number of recursion times exceeds 8, 647 will return eloop