Detailed description of the new file system change mechanism in Linux2.6 Kernel

Source: Internet
Author: User
Tags inotify
Article Title: Detailed description of the new file system change mechanism in Linux2.6 kernel. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.   This article describes in detail inotify, a file system change notification mechanism introduced in Linux 2.6.13 kernel, and illustrates its use and typical application cases.

I. Introduction

As we all know, Linux desktop systems are much less satisfactory than MAC or Windows. To improve this situation, the open-source community proposes that the kernel should provide some mechanisms for user States, in this way, the user State can know in time what happened to the kernel or underlying hardware device, so as to better manage the device and provide users with better services, such as hotplug, udev, and inotify. Hotplug is a kernel mechanism for notifying user-Mode Applications About some events of hot swapping devices. The desktop system can use it to effectively manage devices, inotify is a file system change notification mechanism that allows you to dynamically maintain Device Files under/dev. Events such as file addition and deletion can be immediately known to users, this mechanism was introduced by beagle, a famous Desktop Search Engine Project, and applied to projects such as Gamin.

In fact, there is a similar mechanism named dnotify before inotify, but it has many defects:

1. for each directory to be monitored, you need to open a file descriptor. Therefore, if there are many directories to be monitored, many file descriptors will be opened, especially, if the monitored directory is on a mobile media (such as a disc or USB disk), the file systems such as umount will not be available because the file descriptor opened by the application using dnotify is using the file system.

2. dnotify is directory-based. It can only get Directory change events. Of course, changes to files in the directory will affect the directory where it is located and cause Directory change events, however, to learn which file is changed through directory events, You Need To Cache a lot of data in the stat structure.

3. The Dnotify interface is unfriendly. It uses signal.

Inotify is designed to replace dnotify. It overcomes the defects of dnotify and provides a simpler and powerful file change notification mechanism:

1. inotify does not need to open a file descriptor for the monitored target. If the monitored target is on removable media, after the file system on the umount media, the watch corresponding to the monitored target will be automatically deleted and an umount event will be generated.

2. Inotify can monitor both files and directories.

3. Inotify uses system calls instead of SIGIO to notify file system events.

4. Inotify uses the file descriptor as an interface. Therefore, you can use the normal file I/O operations select and poll to monitor changes in the file system.

Inotify can monitor the following file system events:

  • IN_ACCESS: The file is accessed.
  • IN_MODIFY, the file is written
  • IN_ATTRIB: file attributes are modified, such as chmod, chown, and touch.
  • IN_CLOSE_WRITE, writable file closed
  • IN_CLOSE_NOWRITE: the file cannot be written.
  • IN_OPEN, the file is open
  • IN_MOVED_FROM: The file is removed, such as mv.
  • IN_MOVED_TO: The file is moved, such as mv and cp.
  • IN_CREATE, create a new file
  • IN_DELETE: The file is deleted, such as rm.
  • IN_DELETE_SELF: indicates that an executable file is deleted when it is executed.
  • IN_MOVE_SELF, self-moving, that is, an executable file moves itself during execution
  • IN_UNMOUNT, the host file system is umount
  • IN_CLOSE, the file is closed, equivalent to (IN_CLOSE_WRITE | IN_CLOSE_NOWRITE)
  • IN_MOVE: The file is moved, equivalent to (IN_MOVED_FROM | IN_MOVED_TO)

Note: The files mentioned above also include directories.

  Ii. User Interfaces

In the user State, inotify is used through three system calls and file I/operations on the returned file descriptor. The first step of using inotify is to create an inotify instance:

                 int fd = inotify_init ();         

Each inotify instance corresponds to an independent ordered queue.

The file system change event is called an object management of watches. Each watch is a binary group (destination, event mask), and the target can be a file or directory, the event mask indicates the inotify event to be followed by the application. Each bit corresponds to an inotify event. The Watch object is referenced by the watch descriptor, And the watches object is added by the file or directory path name. The watches directory returns the events that occur on all files in the directory.

The following function is used to add a watch:

                 int wd = inotify_add_watch (fd, path, mask);         

Fd is the file descriptor returned by inotify_init (). path is the path name of the monitored target (that is, the file name or directory name). mask is the event mask, in the header file linux/inotify. h defines the event represented by each digit. You can modify the event mask in the same way, that is, change the inotify event to be notified. Wd is the watch descriptor.

The following function is used to delete a watch:

         int ret = inotify_rm_watch (fd, wd);         

Fd is the file descriptor returned by inotify_init (), and wd is the watch descriptor returned by inotify_add_watch. Ret is the return value of the function.

File events are represented by an inotify_event structure. They are obtained by using the read function of the Common File Reading function returned by inotify_init:

 struct inotify_event {         __s32           wd;             /* watch descriptor */         __u32           mask;           /* watch mask */         __u32           cookie;         /* cookie to synchronize two events */         __u32           len;            /* length (including nulls) of name */         char            name[0];        /* stub for possible name */ }; 

In the structure, wd is the watch descriptor of the monitored target, mask is the event mask, len is the length of the name string, name is the path name of the monitored target, and the name field of this structure is a pile, it only references the file name for the user. The file name is variable and follows the structure. The file name will be filled with 0 so that the next event structure can be 4-byte aligned. Note that len also counts the number of padding bytes.

You can obtain multiple events at a time through the read call, as long as the provided buf is large enough.

                 size_t len = read (fd, buf, BUF_LEN);         

Buf is an array pointer of the inotify_event structure. BUF_LEN specifies the total length to be read. The buf size must be at least smaller than BUF_LEN. The number of events returned by this call depends on the length of BUF_LEN and the file name in the event. Len is the number of bytes actually read, that is, the total length of the obtained event.

You can use select () or poll () on the file descriptor fd returned by the inotify_init () function, or use the ioctl command FIONREAD on fd to get the length of the current queue. Close (fd) will delete all the watches added to fd and perform necessary cleanup.

                 int inotify_init (void);         int inotify_add_watch (int fd, const char *path, __u32 mask);         int inotify_rm_watch (int fd, __u32 mask);         

Iii. kernel Implementation Mechanism

In the kernel, each inotify instance corresponds to an inotify_device structure:

 struct inotify_device {         wait_queue_head_t       wq;             /* wait queue for i/o */         struct idr              idr;            /* idr mapping wd -> watch */         struct semaphore        sem;            /* protects this bad boy */         struct list_head        events;         /* list of queued events */         struct list_head        watches;        /* list of watches */         atomic_t                count;          /* reference count */         struct user_struct      *user;          /* user who opened this dev */         unsigned int            queue_size;     /* size of the queue (bytes) */         unsigned int            event_count;    /* number of pending events */         unsigned int            max_events;     /* maximum number of events */         u32                     last_wd;        /* the last wd allocated */ }; 

Wq is a waiting queue. The process blocked by the read call will be hung in the waiting queue. idr is used to map the watch descriptor to the corresponding inotify_watch. sem is used to synchronize access to the structure, events is the list of events that occur on the inotify instance. All events monitored by the inotify instance are inserted into this list after they occur. watches is the watch list monitored by the inotify instance, inotify_add_watch inserts the newly added watch into this list. count is the reference count, and user is used to describe the user who created the inotify instance. queue_size indicates the number of bytes of the event queue of the inotify instance, event_count is the number of events in the events list, max_events is the maximum number of events allowed, and last_wd is the watch descriptor allocated last time.

Each watch corresponds to an inotify_watch structure:

 struct inotify_watch {         struct list_head        d_list; /* entry in inotify_device's list */         struct list_head        i_list; /* entry in inode's list */         atomic_t                count;  /* reference count */         struct inotify_device   *dev;   /* associated device */         struct inode            *inode; /* associated inode */         s32                     wd;     /* watch descriptor */         u32                     mask;   /* event mask for this watch */ }; 

D_list points to a list composed of all inotify_devices. I _list points to a list composed of all monitored inode. count indicates the reference count. dev points to the inotify_device structure corresponding to the inotify instance of the watch, inode points to the inode to be monitored by the watch. wd is the descriptor assigned to the watch, and mask is the event mask of the watch, indicating which file system events it is interested in.

The structure inotify_device is created when the user State calls inotify_init (). When the file descriptor returned by inotify_init () is disabled, it is released. The structure inotify_watch is created when the user State calls inotify_add_watch () and is released when the user State calls inotify_rm_watch () or close (fd.

Both directories and files correspond to an inode structure in the kernel. The inode System adds two fields to the inode structure:

 #ifdef CONFIG_INOTIFY  struct list_head inotify_watches; /* watches on this inode */  struct semaphore inotify_sem; /* protects the watches list */ #endif 

Inotify_watches is the watch List on the monitored target. Whenever you call inotify_add_watch (), the kernel creates an inotify_watch structure for the added watch, insert it to the inotify_watches list of inode corresponding to the monitored target. Inotify_sem is used to synchronize access to the inotify_watches list. When the first part of the event occurs in the file system, the corresponding file system code will display the call to fsnoop _ * to report the corresponding event to the inotify system, * Indicates the corresponding event name. The current implementation includes:

  • Fsnotify_move: The file is moved from one directory to another.
  • Fsnotify_nameremove: The file is deleted from the directory.
  • Fsnotify_inoderemove, self-Deleted
  • Fsnotify_create: Create a new file
  • Fsnotify_mkdir to create a new directory
  • Fsnotify_access, File Read
  • Fsnotify_modify, file written
  • Fsnotify_open: The file is opened.
  • Fsnotify_close: The file is closed.
  • Fsnotify_xattr. The extension attribute of the file is modified.
  • Fsnotify_change: The file is modified or the original data is modified.

One exception is inotify_unmount_inodes, which is called to notify the file system of the umount event to the inotify system when the file system is umount.

The preceding notification functions call inotify_unmount_inodes to directly call inotify_dev_queue_event. This function first checks whether the corresponding inode is monitored. This function is implemented by checking whether the inotify_watches list is empty, if inode is not monitored and nothing is done, return immediately. Otherwise, traverse the inotify_watches list to check whether the current file operation event is monitored by a watch. If yes, call inotify_dev_queue_event. Otherwise,. The inotify_dev_queue_event function first checks whether the event is a duplicate of the previous event. If yes, It discards the event and returns it. Otherwise, it determines whether the inotify instance, that is, whether the event queue of inotify_device overflows. If yes, an overflow event is generated. Otherwise, a file operation event is generated. These events are constructed through kernel_event. kernel_event creates an inotify_kernel_event structure, insert this structure to the events event list of the corresponding inotify_device, and then wake up the waiting queue that wq points to in the inotify_device structure. If the user-state process that wants to monitor file system events calls read on the inotify instance (that is, the file descriptor returned by inotify_init (), but there is no event, the user-state process hangs on the wq waiting queue.

Iv. Example

The following is an example of using inotify to monitor File System Events:

 #include 
     
       #include 
      
        #include 
       
         _syscall0(int, inotify_init) _syscall3(int, inotify_add_watch, int, fd, const char *, path, __u32, mask) _syscall2(int, inotify_rm_watch, int, fd, __u32, mask) char * monitored_files[] = {  "./tmp_file",  "./tmp_dir",  "/mnt/sda3/windows_file" }; struct wd_name {  int wd;  char * name; }; #define WD_NUM 3 struct wd_name wd_array[WD_NUM]; char * event_array[] = {  "File was accessed",  "File was modified",  "File attributes were changed",  "writtable file closed",  "Unwrittable file closed",  "File was opened",  "File was moved from X",  "File was moved to Y",  "Subfile was created",  "Subfile was deleted",  "Self was deleted",  "Self was moved",  "",  "Backing fs was unmounted",  "Event queued overflowed",  "File was ignored" }; #define EVENT_NUM 16 #define MAX_BUF_SIZE 1024   int main(void) {  int fd;  int wd;  char buffer[1024];  char * offset = NULL;  struct inotify_event * event;  int len, tmp_len;  char strbuf[16];  int i = 0;    fd = inotify_init();  if (fd < 0) {   printf("Fail to initialize inotify.\n");   exit(-1);  }  for (i=0; i
        
         mask & IN_ISDIR) {     memcpy(strbuf, "Direcotory", 11);    }    else {     memcpy(strbuf, "File", 5);    }    printf("Object type: %s\n", strbuf);    for (i=0; i
         
          wd != wd_array[i].wd) continue;     printf("Object name: %s\n", wd_array[i].name);     break;    }    printf("Event mask: %08X\n", event->mask);    for (i=0; i
          
           mask & (1<
           
            len; event = (struct inotify_event *)(offset + tmp_len); offset += tmp_len; } } } 
           
          
         
        
       
      
     

This program monitors all file system events in the current directory and tmp_file in the current directory, it also monitors file system events that occur on files/mnt/sda3/windows_file. Note that/mnt/sda3 is the mount point of SATA hard disk partition 3.

Careful readers may notice that the program uses _ syscallN to declare inotify system calls because these system calls were introduced in the latest stable Kernel 2.6.13, glibc does not implement the library function versions called by these systems. Therefore, to be able to use these system calls in a program, you must declare these new systems through _ syscallN, N indicates the actual number of parameters of the system call to be declared. Note that the system header file must match the started kernel. To enable the above program to compile successfully, you must include/linux /*, include/asm/* and include/asm-generic/*) are in the header file search path and are the first header file path to be searched first, because _ syscallN needs to use linux/unistd in these header files. h and asm/unistd. h. They contain the system call numbers _ NR_inotify_init, _ NR_inotify_add_watch, and _ NR_inotify_rm_watch of three system calls of inotify.

Therefore, to compile the program successfully, you only need to copy the header file of the compiled kernel to the path where the program is located and compile it using the following command:

 $gcc -o inotify_example  -I. inotify_example.c 

Note: The Current Directory should contain three compiled 2.6.13 kernel file directories: linux, asm, and asm-generic. asm is a link, therefore, when copying the asm header file, you need to copy the asm and asm-ARCH (for x86 platform should be a asm-i386 ). Then, in order to run the program, you need to create the tmp_file and tmp_dir files in the current directory. For the/mnt/sda3/windows_file files, you need to decide based on your actual situation, it may be/mnt/dosc/windows_file, that is,/mnt/dosc is a FAT32 windows hard disk, therefore, you need to modify/mnt/sda3 according to your actual situation when compiling the program. Windows_file is a file created on the mounted hard disk. To run this program, it must be created.

The following are some results of running this program on redhat 9.0:

When you run this program, execute cat./tmp_file on another virtual terminal. The output of this program is:

 Some event happens, len = 48. Object type: File Object name: ./tmp_file Event mask: 00000020 Event: File was opened Object type: File Object name: ./tmp_file Event mask: 00000001 Event: File was accessed Object type: File Object name: ./tmp_file Event mask: 00000010 Event: Unwrittable file closed 

The above events clearly show that the cat command executes the open and close operations on the file. Of course, both the open and close operations are access operations, and any operations on the file are access operations.

In addition, run vi./tmp_file to find that vi actually copied a copy when editing the file, and operated on the copy before it was saved. Run vi./tmp_file. When you modify and save and exit, you find that vi actually deleted the original file and changed the copy file name to the name of the original file. Note: The Event "File was ignored" indicates that the system deletes the watch corresponding to the File from the watch list of the inotify instance because the File has been deleted. The reader can execute the command echo "abc">. /tmp_file, rm-f tmp_file, ls tmp_dir, cd tmp_dir; touch c.txt, rm c.txt, umount/mnt/sda3 (the actual user needs to use his current mount point path name ), then analyze the results. Umount triggers two events. One event indicates that the file has been deleted or does not exist, and the other event indicates that the watch of the file has been deleted from the watch list.

V. Typical applications

Beagle is a GNOME Desktop Search Engine Project. inotify is completely driven by it. For a desktop search engine, it is generally run as a background process with a low priority. It is scheduled to run only when there are no other tasks in the system that can run, the main purpose of the desktop search engine is to create an index database for the files in the system's file system, so that users can quickly search for the desired file based on certain keywords or features when they need a file but cannot remember where it is stored, just as easily as using the Internet search engine google. A feature of the file system is that only some files will change. Therefore, after the index database is created for the desktop search engine for the first time, it is unnecessary to traverse all files to create a new index, it only needs to update the index of the modified file, create a new index of the added file, and delete the index of the deleted file, in this way, the work that Desktop Search Engines need to do is greatly reduced. Inotify is specially designed for this purpose. beagle creates an inotify instance for the directory or file to be monitored, and then it waits for a file system event on the inotify, without any file changes, beagle will not require any overhead. Only when monitored events occur will beagle be awakened and indexes of corresponding files be updated based on actual events, then continue to sleep and wait for the next file system event to happen. This desktop search engine is included in SuSe 9.3 and the forthcoming 10.0, which can be used to index documents, emails, music, images, and applications. Readers who have used windows desktop search engines have a deep understanding of google, yahoo, and Microsoft Desktop Search Engines. If you are interested, you can install SuSe.

Vi. Summary

Inotify is a new function introduced in 2.6.13. It provides powerful support for user State monitoring of file system changes. This article describes in detail its origin, kernel implementation, user interfaces, and usage, interested readers can read the relevant source code of 2.6.13 to learn more about its implementation details.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.