Understand the Linux Virtual File System VFS-path lookup path_lookup

Source: Internet
Author: User
Tags symlink

Path search is a major operation of VFS: To get the inode of a file name. Path search is a tedious part of VFS, mainly including symbolic links, file system mount points, and other strange paths such as... And // introduce complexity.

Nameidata data Data Structure

The search process involves many function calls. During these calls, nameidata plays an important role: 1. Passing parameters to the search function; 2. Saving the search result.

struct nameidata {        struct dentry   *dentry;        struct vfsmount *mnt;        struct qstr     last;        unsigned int    flags;        int             last_type;        unsigned        depth;        char *saved_names[MAX_NESTED_LINKS + 1];        /* Intent data */        union {                struct open_intent open;        } intent;};

After the search is complete, @ dentry contains the dentry directory item of the found file; @ MNT contains the vfsmount where the file directory item is located

@ Last contains the name to be searched. This is a fast string. Besides the path string, it also contains the length of the string and a hash value.

@ Depth current path depth.

@ Saved_names: because the name of the Nd has been changing during symbolic link processing, it is used to save the path name in symbolic link processing.

There are many functions to search for the kernel path. Here we use path_lookup as the column, kernel version 2.6.24

Path_lookup

int fastcall path_lookup(const char *name, unsigned int flags,                        struct nameidata *nd){        return do_path_lookup(AT_FDCWD, name, flags, nd);}

Given the three parameters @ name is the file path name (which can be a full or relative path name); @ flags path lookup tag; @ nd does not contain any useful information at this time, returns the search result.

Do_path_lookup

1119 static int fastcall do_path_lookup(int dfd, const char *name,1120                                 unsigned int flags, struct nameidata *nd)1121 {1122         int retval = 0;1123         int fput_needed;1124         struct file *file;1125         struct fs_struct *fs = current->fs;1126 1127         nd->last_type = LAST_ROOT; /* if there are only slashes... */1128         nd->flags = flags;1129         nd->depth = 0;1130 1131         if (*name=='/') {1132                 read_lock(&fs->lock);1133                 if (fs->altroot && !(nd->flags & LOOKUP_NOALT)) {1134                         nd->mnt = mntget(fs->altrootmnt);1135                         nd->dentry = dget(fs->altroot);1136                         read_unlock(&fs->lock);1137                         if (__emul_lookup_dentry(name,nd))1138                                 goto out; /* found in altroot */1139                         read_lock(&fs->lock);1140                 }1141                 nd->mnt = mntget(fs->rootmnt);1142                 nd->dentry = dget(fs->root);1143                 read_unlock(&fs->lock);1144         } else if (dfd == AT_FDCWD) {1145                 read_lock(&fs->lock);1146                 nd->mnt = mntget(fs->pwdmnt);1147                 nd->dentry = dget(fs->pwd);1148                 read_unlock(&fs->lock);1149         } else {1150                 struct dentry *dentry;1151 1152                 file = fget_light(dfd, &fput_needed);1153                 retval = -EBADF;1154                 if (!file)1155                         goto out_fail;1157                 dentry = file->f_path.dentry;1158 1159                 retval = -ENOTDIR;1160                 if (!S_ISDIR(dentry->d_inode->i_mode))1161                         goto fput_fail;1162 1163                 retval = file_permission(file, MAY_EXEC);1164                 if (retval)1165                         goto fput_fail;1166 1167                 nd->mnt = mntget(file->f_path.mnt);1168                 nd->dentry = dget(dentry);1169 1170                 fput_light(file, fput_needed);1171         }1172 1173         retval = path_walk(name, nd);1174 out:1175         if (unlikely(!retval && !audit_dummy_context() && nd->dentry &&1176                                 nd->dentry->d_inode))1177                 audit_inode(name, nd->dentry);1178 out_fail:1179         return retval;1180 1181 fput_fail:1182         fput_light(file, fput_needed);1183         goto out_fail;1184 }

This function is a bit long, but the logic is clear, that is, to call the path_walk preparation, when entering the do_path_lookup function, the parameter @ nd does not contain any useful information, but when calling the path_walk, @ nd contains the start point information.

Therefore, from 1127 ~ Row 3 is the process of preparing to search for the starting point. There are three situations:

1. 1131 ~ 1143 the file name contains an absolute path. Therefore, the root directory of the file system is used as the search start point.

2. 1144 ~ 1148 the path is not an absolute path. We specify to start searching from the current directory.

3. 1149 ~ 1171 the first parameter @ DFD of the function is a directory file descriptor. We can start searching for this directory.

1173 everything is ready. Call path_walk to start searching.

Path_walk

1042 static int fastcall path_walk(const char * name, struct nameidata *nd)1043 {1044         current->total_link_count = 0;1045         return link_path_walk(name, nd);1046 }

Symbolic Links require special processing. Generally, we track links. If there is no symbolic link, the file system must be a perfect tree structure, symbolic Links make the tree not so perfect, and sometimes lead to loops in the tree. Therefore, when querying the path, there is a maximum limit on the number of followed-up symbolic links. The hard encoding for 2.6.24 is 40.

1044 before starting a new search, we initialize it to 0.

Link_path_walk

1018 static int fastcall link_path_walk(const char *name, struct nameidata *nd)1019 {1020         struct nameidata save = *nd;1021         int result;1022 1023         /* make sure the stuff we saved doesn't go away */1024         dget(save.dentry);1025         mntget(save.mnt);1026 1027         result = __link_path_walk(name, nd);1028         if (result == -ESTALE) {1029                 *nd = save;1030                 dget(nd->dentry);1031                 mntget(nd->mnt);1032                 nd->flags |= LOOKUP_REVAL;1033                 result = __link_path_walk(name, nd);1034         }1035 1036         dput(save.dentry);1037         mntput(save.mnt);1038 1039         return result;1040 }

This function looks a bit annoying, mainly because some functions will return an estale error. In this case, you need to re-execute the path search and do not use dcache.

We do not care about this special situation here, so we only need to consider _ link_path_walk

_ Link_path_walk

This function is close to 200 rows, which is beyond the processing capacity of my brain space.

 826 static fastcall int __link_path_walk(const char * name, struct nameidata *nd) 827 { 828         struct path next; 829         struct inode *inode; 830         int err; 831         unsigned int lookup_flags = nd->flags; 832  833         while (*name=='/') 834                 name++; 835         if (!*name) 836                 goto return_reval; 837  838         inode = nd->dentry->d_inode; 839         if (nd->depth) 840                 lookup_flags = LOOKUP_FOLLOW | (nd->flags & LOOKUP_CONTINUE);

833 ~ 836 first process/before the path name /,

The next step is a large loop where Path Components are processed one by one. @ Name is decomposed into various Path Components in a loop. Each component represents a path name. The Code flowchart is provided.

Permission check

 848                 nd->flags |= LOOKUP_CONTINUE; 849                 err = exec_permission_lite(inode, nd); 850                 if (err == -EAGAIN) 851                         err = vfs_permission(nd, MAY_EXEC); 852                 if (err) 853                         break;

Calculate the path component hash

 855                 this.name = name; 856                 c = *(const unsigned char *)name; 857  858                 hash = init_name_hash(); 859                 do { 860                         name++; 861                         hash = partial_name_hash(c, hash); 862                         c = *(const unsigned char *)name; 863                 } while (c && (c != '/')); 864                 this.len = name - (const char *) this.name; 865                 this.hash = end_name_hash(hash);


Processing ..

 874                 /* 875                  * "." and ".." are special - ".." especially so because it has 876                  * to be able to know about the current root directory and 877                  * parent relationships. 878                  */ 879                 if (this.name[0] == '.') switch (this.len) { 880                         default: 881                                 break; 882                         case 2: 883                                 if (this.name[1] != '.') 884                                         break; 885                                 follow_dotdot(nd); 886                                 inode = nd->dentry->d_inode; 887                                 /* fallthrough */ 888                         case 1: 889                                 continue; 890                 }

"." Indicates the current path, so we only need to skip this path component and start to process the next path component.

"..." Indicates returning to the parent directory. You can call follow_dotdot. follow_dotdot is not as simple as it looks, because you need to consider the existence of the Installation Point.


Re-calculate hash

 891                 /* 892                  * See if the low-level filesystem might want 893                  * to use its own hash.. 894                  */ 895                 if (nd->dentry->d_op && nd->dentry->d_op->d_hash) { 896                         err = nd->dentry->d_op->d_hash(nd->dentry, &this); 897                         if (err < 0) 898                                 break; 899                 }

Some file systems have their own hash computing functions. For example, the fat file system is not case sensitive, so the hash function needs to be changed accordingly.

Do_lookup

 900                 /* This does the actual lookups.. */ 901                 err = do_lookup(nd, &this, &next); 902                 if (err) 903                         break;

This function will be introduced later

Process Symbolic Links

 913                 if (inode->i_op->follow_link) { 914                         err = do_follow_link(&next, nd); 915                         if (err) 916                                 goto return_err; 917                         err = -ENOENT; 918                         inode = nd->dentry->d_inode; 919                         if (!inode) 920                                 break; 921                         err = -ENOTDIR; 922                         if (!inode->i_op) 923                                 break; 924                 } else 925                         path_to_nameidata(&next, nd);

If inode-> I _op-> follow_link is not empty, the inode file is a symbolic link file. Otherwise, it must be empty.

Do_follow_link is used to process symbolic links.

924 ~ 925 is not a symbolic link, so the result of next is assigned to the return value @ nd

Do_lookup

 779 /* 780  *  It's more convoluted than I'd like it to be, but... it's still fairly 781  *  small and for now I'd prefer to have fast path as straight as possible. 782  *  It _is_ time-critical. 783  */ 784 static int do_lookup(struct nameidata *nd, struct qstr *name, 785                      struct path *path) 786 { 787         struct vfsmount *mnt = nd->mnt; 788         struct dentry *dentry = __d_lookup(nd->dentry, name); 789  790         if (!dentry) 791                 goto need_lookup; 792         if (dentry->d_op && dentry->d_op->d_revalidate) 793                 goto need_revalidate; 794 done: 795         path->mnt = mnt; 796         path->dentry = dentry; 797         __follow_mount(path); 798         return 0; 799  800 need_lookup: 801         dentry = real_lookup(nd->dentry, name, nd); 802         if (IS_ERR(dentry)) 803                 goto fail; 804         goto done; 805  806 need_revalidate: 807         dentry = do_revalidate(dentry, nd); 808         if (!dentry) 809                 goto need_lookup; 810         if (IS_ERR(dentry)) 811                 goto fail; 812         goto done; 813  814 fail: 815         return PTR_ERR(dentry); 816 }

@ Nd is the input parameter. This structure specifies the parent directory and the vfsmount

@ Name: Specifies the path component name.

@ Path: output parameter. Save the search result.

788 in the dentry cache, query by the parent dentry and path component names. If yes, assign a value to path in the done, where __follow_mount processes the loading points.

790 if it is not found in the dentry cache, it is necessary to call the lookup of the underlying File System for Search. real_lookup will call the underlying lookup function.

792 if dentry-> d_op-> d_revalidate exists, the directory items in dentry cache are not necessarily up-to-date. VFS does not implement this function, but provides this interface for the underlying file system, for example, the NFS file system may cause local dentry cache and remote file content not to be synchronized. We do not care about this situation.

_ Follow_mount

Follow Mount has two versions: __follow_mount and follow_mount. The difference is not big. Analyze one and the other.

 689 /* no need for dcache_lock, as serialization is taken care in 690  * namespace.c 691  */ 692 static int __follow_mount(struct path *path) 693 { 694         int res = 0; 695         while (d_mountpoint(path->dentry)) { 696                 struct vfsmount *mounted = lookup_mnt(path->mnt, path->dentry); 697                 if (!mounted) 698                         break; 699                 dput(path->dentry); 700                 if (res) 701                         mntput(path->mnt); 702                 path->mnt = mounted; 703                 path->dentry = dget(mounted->mnt_root); 704                 res = 1; 705         } 706         return res; 707 }

This function code is very simple and complicated because of its implicit concepts. But it's hard to tell clearly. It's just a bit boring.

We know that there may be an installation point in the lookup path. For example:

/Mnt/sdcard/sd1 is a path with a file file1 file2 in the SD card. We mount an SD card to/mnt/sdcard/sd1.

In this case, our lookup Queries/mnt/sdcard/sd1/file1. When we find/mnt/sdcard/sd1, what we get is the root file system's vfsmount and/mnt/sdcard/sd1 dentry, but if we want to continue to find file1, We must convert it

To continue searching for file1.

OK, _ follow_mount is the task. The cycle is because other devices have mounted to/mnt/sdcard/sd1 before sd1 is mounted to/mnt/sdcard/sd1.

From here, we can see that vfsmount is essential, because dentry cannot uniquely determine a directory item, which must be determined by both vfsmount and dentry.

Do_follow_link

Symlink and hardlink introduce a lot of complexity to the file system. hardlink has no effect on file path searching, and symlink introduces some troubles.

 637 /* 638  * This limits recursive symlink follows to 8, while 639  * limiting consecutive symlinks to 40. 640  * 641  * Without that kind of total limit, nasty chains of consecutive 642  * symlinks can cause almost arbitrarily long lookups.  643  */ 644 static inline int do_follow_link(struct path *path, struct nameidata *nd) 645 { 646         int err = -ELOOP; 647         if (current->link_count >= MAX_NESTED_LINKS) 648                 goto loop; 649         if (current->total_link_count >= 40) 650                 goto loop; 651         BUG_ON(nd->depth >= MAX_NESTED_LINKS); 652         cond_resched(); 653         err = security_inode_follow_link(path->dentry, nd); 654         if (err) 655                 goto loop; 656         current->link_count++; 657         current->total_link_count++; 658         nd->depth++; 659         err = __do_follow_link(path, nd); 660         current->link_count--; 661         nd->depth--; 662         return err; 663 loop: 664         dput_path(path, nd); 665         path_release(nd); 666         return err; 667 }

In Path search, the symlink file itself is not the directory to be searched for. The search target is the file path it represents.

In order to prevent endless paths and the cases that are very nasty (the term nasty can only be said), Linux allows a maximum of 8 recursion times and a maximum of 40 connections.

The occurrence of recursion is 656 rows. It is also possible to call do_follow_link. Therefore, if the number of recursion times exceeds 8, 647 will return eloop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.