Linux kernel source scenario analysis-Device file system Devfs

Source: Internet
Author: User
Tags goto symlink volatile

As we have said many times before, the device file management method based on the main device number/secondary device number is fundamentally flawed. This approach, which has been used since the early days of Unix, has caused trouble with the management of the device number and has also destroyed the/dev directory structure. The structure of all directories in the Unix/linux system is hierarchical, but the/dev directory is "flat". This is not only the style of the problem, but also directly affect the efficiency of access and management convenience or not.

So what should be the desired/dev directory? First of all, it should be hierarchical, tree-like. Second, the scale should be scalable and not limited by the number (for example, 256 main device numbers). Also, the content in the/dev directory should reflect the current device-driven situation of the system. For example, such a scheme would be ideal:

1, the system at the beginning of power on/dev directory is empty.

2. The system scans and enumerates all connected devices during the initialization phase, just like a scan enumeration on a PCI bus. Each device is found to create a driver directory in the/dev directory, and then the number of the device as the lowest node name, such as "/DEV/IDE/HD/1", "/DEV/IDE/FLOPPY/1" and so on.

3, after each insertion of a device, or install an installable module, the kernel in the/dev subtree to add one or several nodes.

4. Conversely, if you close or remove a device, or remove an installable module, the kernel deletes the corresponding node from the/dev subtree.

5, still have to be compatible with the original scheme.

in fact, the device file system DEVFS, and special file system/proc the same idea.

One, the file system type Devfs_fs_type is defined as follows:

Static Declare_fstype (Devfs_fs_type, Devfs_name, Devfs_read_super, Fs_single);
After the compilation preprocessing of GCC, it becomes the following definition:
struct File_system_type devfs_fs_type = {        name:       "Devfs",        Read_super:devfs_read_super,        fs_flags:   Fs_single,        owner:      This_module,}
The system will call INIT_DEVFS_FS during initialization to initialize the DEVFS special file system, as follows:

int __init init_devfs_fs (void) {    int err;    PRINTK ("%s:v%s Richard Gooch ([email protected]) \ n",    Devfs_name, devfs_version); #ifdef config_devfs_debug    Devfs_debug = Devfs_debug_init;    PRINTK ("%s:devfs_debug:0x%0x\n", Devfs_name, Devfs_debug); #endif    printk ("%s:boot_options:0x%0x\n", Devfs_name , boot_options);    Err = Register_filesystem (&devfs_fs_type);//Register File system type Devfs_fs_type    if (!err)    {struct Vfsmount *devfs_ to System MNT = Kern_mount (&devfs_fs_type);//initial installation of special file system Devfserr = Ptr_err (DEVFS_MNT); if (!is_err (devfs_mnt)) ERR = 0;    }    return err;}

struct Vfsmount *kern_mount (struct File_system_type *type) {kdev_t dev = get_unnamed_dev (); struct Super_block *sb;struct Vfsmount *mnt;if (!dev) return err_ptr (-emfile), sb = Read_super (dev, null, type, 0, NULL, 0); if (!SB) {Put_unnamed_dev (dev) ; return Err_ptr (-einval);} MNT = Add_vfsmnt (null, sb->s_root, NULL), if (!mnt) {kill_super (SB, 0); return err_ptr (-ENOMEM);} TYPE->KERN_MNT = mnt;//focus return mnt;} 
static struct  Super_block * Read_super (kdev_t dev, struct block_device *bdev, struct file_system_type *type, int flags, void *data, int silent) {struct Super_block * s;s = Get_empty_super (); if (!s) goto Out;s->s_dev = Dev;s->s_bdev = bdev;s-& Gt;s_flags = Flags;s->s_dirt = 0;sema_init (&s->s_vfs_rename_sem,1); Sema_init (&s->s_nfsd_free_path _sem,1); s->s_type = Type;sema_init (&s->s_dquot.dqio_sem, 1); Sema_init (&s->s_dquot.dqoff_sem, 1); S->s_dquot.flags = 0;lock_super (s); if (!type->read_super (s, data, silent)//devfs_read_supergoto out_fail; Unlock_super (s);/* Tell Bdcache, we is going to keep this one */if (Bdev) atomic_inc (&bdev->bd_count); Out:retur n S;out_fail:s->s_dev = 0;s->s_bdev = 0;s->s_type = Null;unlock_super (s); return NULL;} 
static struct Super_block *devfs_read_super (struct super_block *sb, void *data, int silent) {struct Inode *root_in    Ode = NULL;    if (get_root_entry () = = NULL) goto out_no_root;    Atomic_set (&fs_info.devfsd_overrun_count, 0);    Init_waitqueue_head (&fs_info.devfsd_wait_queue);    Init_waitqueue_head (&fs_info.revalidate_wait_queue);    FS_INFO.SB = SB;    SB->U.GENERIC_SBP = &fs_info;    Sb->s_blocksize = 1024;    Sb->s_blocksize_bits = 10;    Sb->s_magic = Devfs_super_magic;    Sb->s_op = &devfs_sops;    if (Root_inode = Get_vfs_inode (SB, root_entry, null)) = = NULL)//Create an INODE data structure for DEVFS root node goto out_no_root; Sb->s_root = D_alloc_root (Root_inode);//Create a DENTRY data structure for the DEVFS root node, also called "/", and save in Sb->s_root if (!sb->s_ root) goto out_no_root; #ifdef config_devfs_debug if (Devfs_debug & debug_disabled) PRINTK ("%s:read Super, made Dev FS ptr:%p\n ", Devfs_name, SB->U.GENERIC_SBP); #endif return SB;OUT_NO_ROOT:PRINTK ("Devfs_read_super:get root inode failed\n");    if (Root_inode) iput (Root_inode); return NULL;}
like/proc, DEVFS does not have a counterpart on disk, and unlike regular file systems, there are tree-like directory nodes and index nodes on disk, so a tree-like data structure is created in memory. For DEVFS file systems, this data structure is devfs_entry, the code is as follows:

struct devfs_    entry{void *info;    Union {struct Directory_type dir;struct fcb_type fcb;struct symlink_type symlink;struct fifo_type FIFO;    } u;    struct Devfs_entry *prev;    /* Previous entry in the parent directory */struct devfs_entry *next;  /* Next entry in the parent directory */struct devfs_entry *parent;   /* The parent directory */struct devfs_entry *slave;    /* Another entry to unregister */struct devfs_inode inode;    umode_t mode;  unsigned short namelen;    /* I Think 64k+ filenames is a-off ... */unsigned char registered:1;    unsigned char show_unreg:1;    unsigned char hide:1;    unsigned char no_persistence:1;            Char name[1]; /* This is just a dummy:the allocated array is bigger. This is null-terminated */} 
The character array in the structure name[] is the node name, and its size is determined by the specific string length when allocating space for a specific data structure. Each devfs_entry structure is connected to a tree by pointers Prev, Next, parent, slave to implement a file system subtree, as can be seen from the definition of the data structure, there are four different types of nodes in the DEVFS subtree. The first is dir, the directory, which is self-explanatory. The second is the FCB, the "File control block", which is the leaf node in the DEVFS subtree. The third Kind is symlink, which is self-evident. The last one is FIFO, which is used exclusively for piping files. Due to the different node types, the Union_u in the devfs_entry structure is interpreted accordingly into different data structures.
struct file_type{    unsigned long size;}; struct device_type{    unsigned short major;    unsigned short minor;}; struct Fcb_type/  *  File, char, block type  */{    uid_t default_uid;    gid_t Default_gid;    void *ops;    Union     {struct File_type file;struct device_type device;    }    u;    unsigned char auto_owner:1;    unsigned char aopen_notify:1;    unsigned char removable:1;  /*  belongs in Device_type, but save space   */    unsigned char open:1;       /* Not  entirely correct                     */};
It can be seen that there are two types of leaf nodes in Devfs, one is a file and the other is a device. For devices, the FCB_TYPE structure provides a 16-bit primary/secondary device number, but the main device number in DEVFS does not have a fixed correspondence with the specific driver, but is dynamically allocated.

The kernel also has a data structure fs_info, the code is as follows:

struct FS_INFO/* This structure are for each  mounted devfs  */{    unsigned int num_inodes;    /* Number of  inodes created         */    unsigned int table_size;    The current size of the array    struct devfs_entry **table;  Points to a devfs_entry pointer array    struct super_block *sb;    volatile struct devfsd_buf_entry *devfsd_buffer;    volatile unsigned int devfsd_buf_in;    volatile unsigned int devfsd_buf_out;    volatile int devfsd_sleeping;    volatile int devfsd_buffer_in_use;    volatile struct task_struct *devfsd_task;    volatile struct file *devfsd_file;    volatile unsigned long devfsd_event_mask;    atomic_t Devfsd_overrun_count;    wait_queue_head_t Devfsd_wait_queue;    wait_queue_head_t Revalidate_wait_queue;};

After describing these data structures, we go back to Devfs_read_super and continue to see Get_root_entry, the code is as follows:

static struct devfs_entry *get_root_entry (void) {struct devfs_entry *new;    /* Always ensure the root is created */if (root_entry! = NULL) return root_entry; if (root_entry = Create_entry (null, NULL, 0)) = = NULL) return null;//first create DEVFS root node root_entry root_entry->regist    Ered = TRUE;    Root_entry->mode = S_ifdir;    /* Force a inode update, because lookup () is never do for the root */update_devfs_inode_from_entry (root_entry); /* and create the entry for ". DEVFSD" */if (new = Create_entry (Root_entry, ". Devfsd", 0)) = = NULL)//In DEVFS Create a node under the root node ". Devfsd".    return NULL;    new->registered = TRUE; New->u.fcb.u.device.major = Next_devnum_char >> 8;////.devfsd is a FCB node.    In Devfs, the device number is automatically assigned by the system, the device number in the DEVFS does not play an important role in the original New->u.fcb.u.device.minor = Next_devnum_char & 0xFF;    ++next_devnum_char; New->mode = S_IFCHR | S_IRUSR |    S_IWUSR;    New->u.fcb.default_uid = 0;    New->u.fcb.default_gid = 0; New->u.fcb.ops = &devfsd_fops;//ops point to Devfsd_fops, code below return root_entry;} 
static struct Devfs_entry *create_entry (struct devfs_entry *parent, const char *name,unsigned int namelen) {struct DEV    Fs_entry *new, **table;     /* First ensure table size is enough */if (fs_info.num_inodes >= fs_info.table_size)//current num_inodes for 0,table_size is 0 {if (table = kmalloc (sizeof *table * (fs_info.table_size + inode_table_inc), gfp_kernel) = = NULL) return null;//min With the initial 250 pointers allocated space Fs_info.table_size + = Inode_table_inc;//inode_table_inc for 250#ifdef config_devfs_debugif (devfs_debug & Debug_i_create) PRINTK ("%s:create_entry (): Grew inode table to:%u entries\n", Devfs_name, Fs_info.table_siz e); #endifif (fs_info.table)//If you are not allocating space for the first time, you will also copy the existing array to the new space and release the original space {memcpy (table, fs_info.table, sizeof *table *FS    _info.num_inodes); Kfree (fs_info.table);}    Fs_info.table = table;//points to an array of devfs_entry pointers} if (name && (Namelen < 1)) Namelen = strlen (name);    if (new = Kmalloc (sizeof *new + Namelen, gfp_kernel) = = null) return null; /* Magic:this would set the CTime to zero, thus subsequent lookups willtrigger the call to <update_devfs_inode_from_ent    ry> */memset (new, 0, sizeof *new + Namelen);    New->parent = parent;    if (name) memcpy (new->name, name, Namelen);    New->namelen = Namelen;    New->inode.ino = Fs_info.num_inodes + First_inode;//first_inode is 1, later increments, the node number is set new->inode.nlink = 1; Fs_info.table[fs_info.num_inodes] = new;//update devfs_entry pointer array ++fs_info.num_inodes;//increment if (parent = = NULL) return new    ; New->prev = parent->u.dir.last;//Link The new node with the parent node and other nodes in the same directory/* Insert into the parent directory ' s list of Childr    En */if (Parent->u.dir.first = = NULL) Parent->u.dir.first = new;    else Parent->u.dir.last->next = new;    Parent->u.dir.last = new; return new;}
static struct file_operations devfsd_fops ={    read:    devfsd_read,    ioctl:   devfsd_ioctl,    release: Devfsd_close,};


BackDevfs_read_super, continue execution, Get_vfs_inode, create an inode data structure for the root node of DEVFS, with the following code:

static struct Inode *get_vfs_inode (struct super_block *sb,    struct devfs_entry *de,    struct dentry *dentry) {    struct Inode *inode;    if (de->inode.dentry! = NULL)    {PRINTK ("%s:get_vfs_inode (%u): old de->inode.dentry:%p \"%s\ "  new dentry: %p \ "%s\" \ n ", Devfs_name, De->inode.ino,de->inode.dentry, De->inode.dentry->d_name.name,dentry, dentry- >d_name.name);p rintk ("Old  inode:%p\n", De->inode.dentry->d_inode); return NULL;    }    if (Inode = Iget (SB, De->inode.ino) = = NULL) return null;//first finds in the hash queue of the inode structure and creates a new de->inode.dentr if it is not found    y = dentry;    return inode;}
D_alloc_root, create a dentry data structure for DEVFS root node, also called "/", the code is as follows:

struct Dentry * d_alloc_root (struct inode * root_inode) {struct Dentry *res = null;if (root_inode) {res = D_alloc (NULL, &am p; (const struct QSTR) {"/", 1, 0}); if (res) {RES->D_SB = Root_inode->i_sb;res->d_parent = Res;d_instantiate (res , Root_inode);}} return res;}
Finally, return to Kern_mount and continue with the ADD_VFSMNT code as follows:

static struct Vfsmount *add_vfsmnt (struct nameidata *nd,struct dentry *root,const char *dev_name) {struct Vfsmount *mnt;st Ruct super_block *SB = Root->d_inode->i_sb;char *name;mnt = kmalloc (sizeof (struct vfsmount), gfp_kernel); if (!MNT)  Goto Out;memset (mnt, 0, sizeof (struct vfsmount)); if (nd | | dev_name) mnt->mnt_flags = mnt_visible;/* It may be NULL, but Who cares? */if (dev_name) {name = Kmalloc (strlen (dev_name) +1, Gfp_kernel); if (name) {strcpy (name, dev_name); mnt->mnt_devname = Name;}} Mnt->mnt_owner = Current->uid;atomic_set (&mnt->mnt_count,1); MNT-&GT;MNT_SB = Sb;spin_lock (&dcache _lock); if (nd &&!is_root (nd->dentry) && d_unhashed (nd->dentry)) goto Fail;mnt->mnt_root = Dget (root); mnt->mnt_mountpoint = nd? Dget (nd->dentry): Dget (root); mnt->mnt_parent = nd? Mntget (ND-&GT;MNT): mnt;if (nd) {list_add (&mnt->mnt_child, &nd->mnt->mnt_mounts); List_add (& Mnt->mnt_clash, &nd->dentry->d_vfsmnt);} ElSE {init_list_head (&mnt->mnt_child); Init_list_head (&mnt->mnt_clash);} Init_list_head (&mnt->mnt_mounts); List_add (&mnt->mnt_instances, &sb->s_mounts); List_add ( &mnt->mnt_list, Vfsmntlist.prev); Spin_unlock (&dcache_lock); Out:return Mnt;fail:spin_unlock (& Dcache_lock); if (mnt->mnt_devname) Kfree (mnt->mnt_devname); Kfree (MNT); return NULL;}


second, after the installation of the total root of the system, the system initialization process will also call Mount_devfs_fs to DEVFS for further installation.

void __init Mount_devfs_fs (void) {    int err;    if ((Boot_options & Option_nomount)) return;    Err = Do_mount ("None", "/dev", "Devfs", 0, "");    if (err = = 0) printk ("Mounted DEVFS on/dev\n");    else PRINTK ("Warning:unable to Mount Devfs, err:%d\n", err);}

As can be seen, the installation point of the device file system Devfs is "/dev". Because the FS_SINGLE flag bit in Devfs_fs_type is 1, the VFSMOUNT structure pointer stored in Devfs_fs_type is installed on the "/dev" node.

Do_mount refer to the Linux kernel source scenario analysis-File system installation and Linux kernel source scenario analysis-Special file system/proc.


Third, the completion of the installation of DEVFS, the driver of various devices can be Devfs_register_chrdev or Devfs_register_blkdev to DEVFS registration, in the/dev directory to create the corresponding node.

1, through the Devfs_register_chrdev to DEVFS registration of a class of equipment, the main equipment number, equipment number and file_operations data structure, establish the relationship between the three. The main device number can be statically specified, or it can require DEVFS to be dynamically allocated.

2. Create a directory node through Devfs_mk_dir

3. Register the specific equipment through devfs_register and set up the leaf node under the specified directory.

The above three procedures are similar to the/proc special file system, creating a proc_dir_entry structure of sub-nodes below the proc node, with the following code:

void __init proc_root_init (void) {Proc_misc_init ();p roc_net = Proc_mkdir ("net", 0); #ifdef config_sysvipcproc_mkdir (" SYSVIPC ", 0); #endif #ifdef config_sysctlproc_sys_root = proc_mkdir (" sys ", 0); #endifproc_root_fs = Proc_mkdir (" FS ", 0); Proc_root_driver = Proc_mkdir ("Driver", 0); #if defined (CONFIG_SUN_OPENPROMFS) | | Defined (config_sun_openpromfs_module)/* Just give it a mountpoint */proc_mkdir ("Openprom", 0); #endifproc_tty_init (); Ifdef config_proc_devicetreeproc_device_tree_init (); #endifproc_bus = Proc_mkdir ("Bus", 0);}

After completing the above three steps, we can access the specific device through "/DEV/IDE/HD/1", "/DEV/IDE/FLOPPY/1" (the shell will see the corresponding files and directories). And the original to the "/DEV/HD1", "/dev/floppy1" operation is no different.

Linux kernel source scenario analysis-Device file system Devfs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.