Fuse user-mode and kernel-mode communication mechanism analysis

Source: Internet
Author: User
Tags unique id

There are many articles on Fuse user-mode file systems, such as http://my.debugman.net/program/fuse-180.html, which are fully written. However, there are still few articles on Fuse user-mode and kernel-mode communications. One of the articles I have found is.

This section mainly analyzes the kernel-mode user-Mode Communication Mechanism of fuse. Shows the main running process of fuse:

When the user-State program executes POSIX file system operations and passes through glibc, it is transformed into a system call and transmitted to VFS, and VFS then transmits it to the fuse kernel module, the fuse kernel module sends requests to user-state fuse processes based on the type of system calls and waits for responses from user-State processes. After receiving the response, the fuse kernel module sends it to VFS and displays the final running result to the user State program.

So how does fuse allow communication between the user and the kernel? This can be seen clearly in the source code.

First, in the kernel code fs/fuse/dev. C,

/* Define a misc device for fuse */static struct miscdevice fuse_miscdevice = {. minor = FUSE_MINOR ,. name = "fuse",/* The generated misc device will appear in/dev/fuse */. fops = & fuse_dev_operations,}; int _ init fuse_dev_init (void) {int err =-ENOMEM; counts = sums ("fuse_request", sizeof (struct fuse_req), 0, 0, NULL); if (! Authorization) goto out; err = misc_register (& fuse_miscdevice);/* Register as a misc device. The master device number of the misc device is 10 */if (err) goto out_cache_clean; return 0; out_cache_clean: kmem_cache_destroy (fuse_req_cachu); out: return err ;}

By calling the fuse_dev_init function, a misc device is generated (similar to a character device, but the master device number is 10, and a device file is automatically generated under the/dev/directory based on the device name) under/dev/fuse. The user-mode code registers a function for the fuse kernel-mode communication through the open device file and the following functions:

struct fuse_chan *fuse_kern_chan_new(int fd){struct fuse_chan_ops op = {.receive = fuse_kern_chan_receive,.send = fuse_kern_chan_send,.destroy = fuse_kern_chan_destroy,};size_t bufsize = getpagesize() + 0x1000;bufsize = bufsize < MIN_BUFSIZE ? MIN_BUFSIZE : bufsize;return fuse_chan_new(&op, fd, bufsize, NULL);}

Fuse_kern_chan_receive function, read the kernel request from/dev/fuse through res = read (fuse_chan_fd (CH), Buf, size); and then use ssize_t in fuse_kern_chan_send Function
Res = writev (fuse_chan_fd (CH), IOV, count); sends data to the kernel module.

Return to the kernel module, or the FS/fuse/dev. c file. Fuse registers the following Operation callback for the/dev/fuse device file to support user-mode read/write operations:

Const struct file_operations fuse_dev_operations = {. owner = this_module ,. llseek = no_llseek,/* does not support seek operations */. read = do_sync_read,/* use a common synchronous READ function */. aio_read = fuse_dev_read,/* the asynchronous function provided by fuse for user-mode Reading */. write = do_sync_write,/* use a common synchronous write correspondence */. aio_write = fuse_dev_write,/* asynchronous correspondence provided by fuse for user-mode Reading */. poll = fuse_dev_poll,/* check for any operations on a file. If not, sleep until operations on the file occur */. release = fuse_dev_release,/* User State close FD corresponding to this device file */. fasync = fuse_dev_fasync,/* enable or disable I/O Event Notifications Through signals */};

In do_sync_read, ret = filp-> f_op-> aio_read (& kiocb, & IOV, 1, kiocb. ki_pos). Similarly, in the do_sync_write function, ret = filp-> f_op-> aio_write (& kiocb, & IOV, 1, kiocb is also called. ki_pos), so they do not need to be implemented separately.

In the fuse kernel, there is a fuse_conn structure, which is a user-state and kernel-State communication service. Its structure is:

/*** A fuse connection. ** this structure is created, when the filesystem is mounted, and is * destroyed, when the client device is closed and the filesystem is * unmounted. */struct fuse_conn {/** lock protecting accessess to members of this structure */spinlock_t lock;/** mutex protecting against directory alias creation */struct mutex inst_mutex; /** reference count of the refcount struct */atomic_t count;/** the US Er ID for this mount user ID */uid_t user_id;/** the group ID for this mount group ID */gid_t group_id; /** the fuse Mount flags for this mount Mount parameter */unsigned flags;/** maximum read size: Maximum number of read bytes */unsigned max_read; /** maximum write Size Maximum number of written bytes */unsigned max_write;/** readers of the connection are waiting on this read request waiting queue */wait_queue_head_t waitq; /** the list of pending requests queue waiting */struct list_head PE Nding;/** the list of queues being processed by requests being processed */struct list_head processing; /** the list of requests under I/O queues for Io operations */struct list_head IO;/** the next unique Kernel File handle */u64 khctr; /** rbtree of fuse_files waiting for Poll events indexed by pH */struct rb_root polled_files;/** maximum number of outstanding background requests maximum number of background requests */unsigned max_background; /** Number of bac Kground requests at which congestion starts */unsigned congestion_threshold;/** number of requests currently in the Background request count */unsigned num_background; /** number of background requests currently queued for userspace number of background requests being executed */unsigned active_background; /** the List of background requests set aside for later queuing */struct list_head bg_queue;/** pending interrupts interrupt request queue */struct list_hea D interrupts;/** flag indicating if connection is blocked. this will be the case before the init reply is already ed, and if there are too blocks outstading backgrounds requests blocking sign */INT blocked; /** waitq for blocked connection blocking waiting queue */wait_queue_head_t blocked_waitq;/** waitq for reserved requests waiting queue */wait_queue_head_t reserved_req_waitq; /** the next unique Request ID */u64 reqctr;/** connectio N established, cleared on umount, connection abort and device release connection flag */unsigned connected;/** connection failed (Version mismatch ). cannot race with setting other bitfields since it is only set once in init reply, before any other request, and never cleared */unsigned conn_error: 1;/** connection successful. only set in init */unsigned conn_init: 1;/** do readpages asynchronously? Only set in init */unsigned async_read: 1;/** do not send separate setattr request before open (o_trunc) */unsigned atomic_o_trunc: 1;/** filesystem supports NFS exporting. only set in init */unsigned export_support: 1;/** set if BDI is valid */unsigned bdi_initialized: 1; /** the following bitfields are only for optimization purposes * and hence races in setting them will not cause malfunction * // ** I S fsync not implemented by FS? */Unsigned no_fsync: 1;/** is fsyncdir not implemented by FS? */Unsigned no_fsyncdir: 1;/** is flush not implemented by FS? */Unsigned no_flush: 1;/** is setxattr not implemented by FS? */Unsigned no_setxattr: 1;/** is getxattr not implemented by FS? */Unsigned no_getxattr: 1;/** is listxattr not implemented by FS? */Unsigned no_listxattr: 1;/** is removexattr not implemented by FS? */Unsigned no_removexattr: 1;/** are File Locking primitives not implemented by FS? */Unsigned no_lock: 1;/** is access not implemented by FS? */Unsigned no_access: 1;/** is create not implemented by FS? */Unsigned no_create: 1;/** is interrupt not implemented by FS? */Unsigned no_interrupt: 1;/** is bmap not implemented by FS? */Unsigned no_bmap: 1;/** is poll not implemented by FS? */Unsigned no_poll: 1;/** do multi-page cached writes */unsigned big_writes: 1;/** don't apply umask to creation modes */unsigned dont_mask: 1;/** the number of requests waiting for completion */atomic_t num_waiting;/** negotiated minor version */unsigned minor;/** backing dev info */struct backing_dev_info BDI; /** entry on the fuse_conn_list */struct list_head entry;/** device ID from the device ID of the super block */dev_t dev; /** dentries in the control filesystem */struct dentry * ctl_dentry [fuse_ctl_num_dentries];/** Number of dentries used in the above array */INT ctl_ndents; /** o_async requests */struct fasync_struct * fasync;/** key for lock owner ID scrambling */u32 scramble_key [4]; /** reserved request for the destroy message */struct fuse_req * destroy_req;/** version counter for attribute changes File Attribute version */u64 attr_version; /** called on final put */void (* release) (struct fuse_conn *);/** super block for this connection. */struct super_block * Sb;/** read/write semaphore to hold when accessing sb. semaphore for accessing Super blocks */struct rw_semaphore killsb ;};

The pointer of the fuse_conn struct will be stored in file-> private_data. The fuse_conn struct will be used every time the kernel state sends sentiment requests to the user State. The processing process of the fuse_dev_read function is as follows:

Static ssize_t fuse_dev_read (struct kiocb * iocb, const struct iovec * iov, unsigned long nr_segs, loff_t pos) {struct fuse_in * in; /* indicates the kernel read by the user State * // omitting the variable definition struct fuse_conn * fc = fuse_get_conn (file);/* obtains the pointer of the fuse_conn struct */if (! Fc) return-EPERM; restart: spin_lock (& fc-> lock); err =-EAGAIN; if (file-> f_flags & O_NONBLOCK) & fc-> connected &&! Request_pending (fc) // if the request is not blocked, the system checks whether the request is waiting for processing in the queue. If there is no request, the system returns goto err_unlock; request_wait (fc ); // block requests waiting for the kernel state to arrive ...... if (! List_empty (& fc-> interrupts) {// determines whether an interrupted request needs to be sent. If yes, the request req = list_entry (fc-> interrupts is interrupted first. next, struct fuse_req, intr_entry); return fuse_read_interrupt (fc, req, iov, nr_segs);} req = list_entry (fc-> pending. next, struct fuse_req, list); // obtain the next request req-> state = FUSE_REQ_READING from the pending queue; list_move (& req-> list, & fc-> io); // move the request to the in = & req-> in; reqsize = in-> h. len;/* If request is too large, rep Ly with an error and restart the read */........ spin_unlock (& fc-> lock); fuse_copy_init (& cs, fc, 1, req, iov, nr_segs ); // prepare err = fuse_copy_one (& cs, & in-> h, sizeof (in-> h) to copy the request to the user State )); // copy the request header to the user State if (! Err) err = fuse_copy_args (& cs, in-> numargs, in-> argpages, (struct fuse_arg *) in-> args, 0 ); // copy the requested package to the user State. If there are multiple parameters in the package, you need to copy the parameters cyclically to fuse_copy_finish (& cs); // complete the copy, release the memory spin_lock (& fc-> lock); req-> locked = 0; // identify the error in the sending process, omitting .... if (! Req-> isreply) // if no value is returned, end the request request_end (fc, req); else {req-> state = FUSE_REQ_SENT; // if this request requires the user to return the execution result list_move_tail (& req-> list, & fc-> processing); // The request is forwarded to the processing queue, to fuse_dev_write to process if (req-> interrupted) queue_interrupt (fc, req); spin_unlock (& fc-> lock);} return reqsize; err_unlock: spin_unlock (& fc-> lock); return err ;}

The fuse_in struct is as follows:

/** The request input */struct fuse_in {/** The header of The request header command */struct fuse_in_header h; /** True if the data for the last argument is in req-> pages */unsigned argpages: 1; /** Number of parameters contained in the arguments command */unsigned numargs;/** Array of arguments parameter */struct fuse_in_arg args [3];};

The other two struct contained in this struct

Struct fuse_in_header {__ u32len; // package length _ u32opcode; // operation code, used to indicate the Operation Type _ u64unique; // unique ID of the package _ u64nodeid; // indicates the id of the Operation file node, similar to ino _ u32uid ;__ u32gid ;__ u32pid ;__ u32padding; // is the node suspended ???};

/** One input argument of a request */struct fuse_in_arg {unsigned size; // parameter length const void * value; // parameter pointer };

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.