Kernel implementation mechanism of poll system calls

Source: Internet
Author: User

All rights reserved. For more information, see all right reserved and copyright by Xu Xing.

We have analyzed the kernel execution process caused by system calls in detail. This article will continue to analyze the poll function in the kernel source code of linux2.6.38 (similar to the select implementation, the two are essentially the same, because the implementation mechanism is the same, and the middle-level poll function will be called eventually.

Through the previous analysis, we know that the open, read, and write function system calls in the application will trigger a Soft Interrupt exception and thus initiate exception handling, in exception handling, the system call number passed in the user State is obtained, and the actual system call processing function is obtained in the system call table according to the system call number, for example, the sys_open, sys_read, and sys_write functions in the kernel correspond to the open, read, and write functions in the driver.

The poll mechanism is no exception. When the poll function or select function is called in the user space, the sys_poll or sys_select function in the kernel space will be called. The following uses sys_poll to analyze the poll implementation of a user space:

User space call: poll kernel: asmlinkage long sys_poll (struct pollfd _ User * ufds, unsigned int NFDs, long timeout); that is, syscall_define3 (poll, struct pollfd _ User *, ufds, unsigned int, NFDs, long, timeout_msecs) // macro

The implementation is as follows:

\ FS \ select. c

SYSCALL_DEFINE3(poll, struct pollfd __user *, ufds, unsigned int, nfds,long, timeout_msecs){struct timespec end_time, *to = NULL;int ret;if (timeout_msecs >= 0) {to = &end_time;poll_select_set_timeout(to, timeout_msecs / MSEC_PER_SEC,NSEC_PER_MSEC * (timeout_msecs % MSEC_PER_SEC));}ret = do_sys_poll(ufds, nfds, to);if (ret == -EINTR) {struct restart_block *restart_block;restart_block = ¤t_thread_info()->restart_block;restart_block->fn = do_restart_poll;restart_block->poll.ufds = ufds;restart_block->poll.nfds = nfds;if (timeout_msecs >= 0) {restart_block->poll.tv_sec = end_time.tv_sec;restart_block->poll.tv_nsec = end_time.tv_nsec;restart_block->poll.has_timeout = 1;} elserestart_block->poll.has_timeout = 0;ret = -ERESTART_RESTARTBLOCK;}return ret;}

To make the structure simple and intuitive, we only list the call relationship framework:

User space: poll kernel: asmlinkage long sys_poll (struct pollfd _ User * ufds, unsigned int NFDs, long timeout); that is, syscall_define3 (poll, struct pollfd _ User *, ufds, unsigned int, NFDs, long, timeout_msecs) // The macro token (to, timeout_msecs/msec_per_sec, nsec_per_msec * (timeout_msecs % msec_per_sec )); // configure the timeout value do_sys_poll (ufds, NFDs, to); poll_initwait (& table); // initialize the wait queue init_poll_funcptr (& pwq-> PT, _ Pollwait); Pt-> qproc = qproc;/* Table-> Pt-> qproc = _ pollwait; For details, see note 1-2 */do_poll (NFDs, Head, & table, end_time); For (;) {for (; PFD! = Pfd_end; PFD ++) // for multiple processes {If (do_pollfd (PFD, pt)/* do_pollfd calls the driver poll function, poll_wait in poll finally calls Pt-> qproc function (I .e. _ pollwait) to mount the waiting queue linked list header that may cause changes in the content to be monitored to the query table. In addition, poll also determines whether an event occurs based on the conditions. When an event occurs, the mask is set to a non-zero mask and the value is returned. For details, see note 1-1 */{count ++; // The poll function in the driver returns a non-zero mask, which is automatically added. The number of processes waiting for the event is Pt = NULL;} If (! Count)/* If count is 0 (indicating that no waiting event occurs) */{COUNT = wait-> error; If (signal_pending (current )) /* determine whether it is a wake-up signal */COUNT =-eintr;}/* If count is not 0 (a waiting event has occurred) or if timed_out is not 0 (when a signal occurs or times out), the break */If (count | timed_out) is introduced./* When the preceding conditions are not met, if a waiting event occurs, it will wake up */If (! Poll_schedule_timeout (wait, task_interruptible, to, slack) timed_out = 1 ;} poll_freewait (& table); // clear resources occupied by poll_wqueues, including Waiting List headers added to the query table

Note 1-1:

Do_pollfd (PFD, pt)

Mask = file-> f_op-> poll (file, pwait);/* call the poll function in the driver. If the return value of the poll driver function is not 0, the event occurs, it will make Count ++ */

The call relationship is as follows:

Do_pollfd (PFD, pt) mask = file-> f_op-> poll (file, pwait); // call the poll function in the driver (compile the poll function when writing the driver) poll_wait // mounts the waiting queue linked list header that may cause changes in the content to be monitored to the poll_table query table. The specific implementation is as follows: p-> qproc (filp, wait_address, p); (p-> qproc = _ pollwait has been initialized before) _ pollwait (struct file * filp, wait_queue_head_t * wait_address, poll_table * P) (The above expansion is called for this function) {entry-> wait_address = wait_address; // mount the process to wait for the queue linked list header to the query table add_wait_queue (wait_address, & Entry-> wait );}

In addition, the poll driver determines whether an event has occurred based on the conditions. When an event occurs, a non-zero mask is returned, that is, the mask is true, Count ++, and sleep;

The following is an example of a poll driver function:

Static unsigned forth_drv_poll (struct file * file, poll_table * Wait) {unsigned int mask = 0; poll_wait (file, & button_waitq, wait ); // Add the button_waitq wait table header to the wait query table if (ev_press) // determine whether the event has occurred. If so, the return value is non-zero, that is, true. Mask | = Pollin | pollrdnorm; return mask ;}

Note 1-2:

1. the main purpose here is to initialize the function pointer and point it to the _ pollwait function. In the poll driver function, poll_wait will be called, in this function, the _ pollwait function will be called to mount the current process to the waiting queue,

2. the _ pollwait function has the following two sentences: Entry-> wait_address = wait_address; add_wait_queue (wait_address, & Entry-> wait ); it is used to mount the waiting queue linked list header that may cause changes to the monitoring content in the current process to the query table, table-> Pt-> qproc = _ pollwait is the initialization table-> Pt-> qproc function pointer.

According to the above analysis, even if only one descriptor is ready, Poll traverses the entire set. If there are few active descriptors in the Set, the overhead of the traversal process will become very large. if most of the descriptors in the set are active, the overhead of the traversal process can be ignored. Therefore, when most of the descriptors in the set are active, poll is highly efficient. Every time a user space calls the poll function, all data must be copied to the kernel, increasing the overhead of data replication.

Reference: http://watter1985.iteye.com/blog/1614039

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.