Implementation principle of Select&poll&epoll under Linux (i)

Source: Internet
Author: User
Tags epoll

Recently, I have seen a simple implementation of the IO event detection API Select/poll/epoll in Linux linux-3.10.25 kernel. Make some records here.
The basic principle is the same, the process is as follows

    1. Call the FD corresponding struct File.f_op->poll () method (if available) to try to check if each FD that provides the IO to be detected has an IO event ready
    2. If an IO event is already ready, the IO event that is collected directly is returned, and the call ends
    3. If there is no IO event ready at the moment, selectively enter the wait based on the given timeout parameter
    4. If the timeout parameter indicates no wait, the call ends and no IO event is returned
    5. Suspends the current Select/poll/epoll call task if the timeout parameter indicates a wait (for a period of time or a continuous wait)
    6. When any of the detected FD has a new IO event that occurs, the wait task above is awakened. After the task is awakened, the IO event collection process in 1 is re-executed, and the IO event collected at this time is returned, and the call process ends.

It can be seen that the process is not complex, this article in accordance with the above process, the implementation of Select/poll to do further analysis, the implementation of Epoll to be more complex, in addition to do the narrative.

The key points above are
1. How do I wake up a suspended task when there is an IO event on the FD when the call task is suspended, when there is no IO event at first?

The next poll () implementation process, introduces the implementation of this process principle.
To implement the wake-up process, the more critical steps are
1) at the first call to struct File.f_op->poll () above, in addition to the struct file pointer corresponding to FD, you also need to pass in a pointer of type poll_table
The poll_table is specifically provided during the implementation of the poll ().
2) in each FD's struct File.f_op->poll () method, it is necessary to call the corresponding poll_wait ()

One of the key
The struct File.f_op->poll () method acts as a direct query to the IO event on the FD
and Poll_wait () is both implemented as needed to add poll () to the calling task, optionally adding to the IO event waiting queue of the FD.

Note that this is "optional add", why so to say, depends on the implementation of Poll_wait (), which is an inline function, as follows

Static void poll_wait (struct file * filp, wait_queue_head_t * wait_address, poll_table *p) {     if (P && p->_qproc && wait_address)        p--_qproc (Filp, wait_address, p);}

defined in the include/linux/poll.h.

As you can see, when the struct File.f_op->poll () is called, the poll_wait () call has a practical effect when the _qproc member of Poll_table is defined.
As already said, "poll_table is specifically defined in the implementation of poll ()", specifically poll_initwait (&table) call to initialize poll_table.
Take a closer look at the poll_initwait () implementation, as follows

1 voidPoll_initwait (structPoll_wqueues *pwq)2 {3Init_poll_funcptr (&pwq->pt, __pollwait);4Pwq->polling_task =Current ;5pwq->triggered =0;6Pwq->error =0;7Pwq->table =NULL;8Pwq->inline_index =0;9}

which

 1  static  inline void  init_poll_funcptr (poll_table *pt, Poll_queue_proc qproc)  2  { 3  pt->_qproc = Qproc;  4  Pt->_key = ~0ul ; /*   All events enabled  */ 5 } 

That is, the first traversal of poll given FD, until the initial collection of IO events, will be called to __pollwait ().
In fact, it is in __pollwait () that the calling task of poll () is added to the wait queue of FD, and the callback is specified in the Wake execution process.
In the implementation of poll (), the fallback function is Pollwake (), specifically pollwake () calls Kernel's Wake API (Default_wake_function ()) to wake the calling task of poll ().
(The natural wake-up process should also be in the implementation of the poll () because it was previously suspended in the implementation of poll ().)

The above is a description of the poll () on this side to wait for the wake-up, the preparation work. Let's take a look at how the task of waiting for itself wakes up when the IO time on the FD is ready.
For example, the implementation of EVENTFD is simple and convenient to describe.

Let's briefly talk about the EVENTFD feature, which maintains a 64-bit counter inside. When the counter is greater than 0 o'clock, there is a readable event on the FD and a writable implementation when the counter value is less than Ullong_max
See its implementation code know, its specific to the Count value update process occurs in Eventfd_ctx_read ()/Eventfd_write ().
After the Eventfd_ctx_read () operation, the internal counter value is reduced, ending with the following code in which the fragment wakes up the waiting task recorded in the internal Wait_queue.

1  if 0 ) {2      eventfd_ctx_do_read (CTX, CNT); 3       if (Waitqueue_active (&ctx->wqh)) 4           Wake_up_locked_poll (&ctx->wqh, pollout); 5  }

When the Eventfd_write () operation is complete, the internal counter value is increased, and the following code fragment wakes the waiting task recorded in the internal Wait_queue to the end.

1 if 0 ) {2     ctx->count + = ucnt; 3     if (Waitqueue_active (&ctx->wqh)) 4         Wake_up_locked_poll (&ctx->wqh, Pollin); 5 }

At this point, the complete wake-up notification process is complete.

The Select () implementation process is exactly the same as poll (), and the different FD and result events to be detected are returned in different ways. The poll () is internally recorded using a linked list, and select () is recorded using a bit sequence.
For details, see the implementation code for poll () and select () in Fs/select.c.

The Epoll () event checking method is similar to poll ()/select (), and one obvious difference is that each FD is registered to Epoll for records management, and Poll/select needs to be copied from the user to the kernel every time.
and epoll internal records of FD are used Rbtree, and there are some of their own unique features, which will be described in another article. This article ends here

~ ~ End ~ ~

Implementation principle of Select&poll&epoll under Linux (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.