Linux kernel Note: Epoll implementation principle (II)

Source: Internet
Author: User
Tags epoll

When a monitored file descriptor is added to Epoll through Epoll_ctl (2), the wait queue for the monitored file is added as a callback function by Ep_poll_callback (). The following analysis Ep_poll_callback () function

1004 static int Ep_poll_callback (wait_queue_t *wait, unsigned mode, int sync, void *key) 1005 {1006         int pwake = 0;1007< c1/>unsigned long flags;1008         struct Epitem *epi = ep_item_from_wait (wait); 1009         struct Eventpoll *ep = Epi->ep ; 1010         int ewake = 0;

The 1008 line first calls Ep_item_from_wait () to get to the struct struct epitem associated with the monitored file Description typeface, which is obtained by using the CONTAINER_OF macro.

1009 rows are then obtained to the struct struct eventpoll representing the Epoll object instance based on the EP field of the struct Epitem.

1012         if ((unsigned long) key & Pollfree) {1013                 ep_pwq_from_wait (wait)->whead = null;1014                 /*1015                  * Whead = NULL above can race with Ep_remove_wait_queue () 1016                  * which can does another remove_wait_queue () after US , so we1017                  * can ' t use __remove_wait_queue (). Whead->lock is held by1018                  * The caller.1019                  */1020                 lis T_del_init (&wait->task_list); 1021         }

Determines if the flag bit pollfree is set in the returned event mask (when will this flag be set?). If it is, the current wait object is removed from the file descriptor's wait queue (question: What does the comment mean?). Why don't you need a lock? )。

Next, lock the instance of Epoll:

1023         Spin_lock_irqsave (&ep->lock, flags);

Next, determine if the event mask in Epitem does not include any poll (2) events, and if so, then return directly after unlocking:

1025         /*1026          * If The event mask does not contain any poll (2) event, we consider the1027          * descriptor-be dis abled. This condition was likely the effect of the1028          * Epolloneshot bit that disables the descriptor when an event is Receiv ed,1029          * Until the next epoll_ctl_mod'll be issued.1030          */1031         if (!) Epi->event.events & ~ep_private_bits)) 1032                 goto Out_unlock;

When will this happen? The note also said that it was when the Epolloneshot logo was set. The processing of the EPOLLONESHOT flag is in the return process of epoll_wait (), when the Ep_send_events_proc () is called, if the Epolloneshot flag is set Ep_private_ Flag bits other than bits all clear 0:

1552                         if (epi->event.events & epolloneshot) 1553                                 epi->event.events &= ep_private_bits;

Next, determine whether there are events in the returned event that the user is genuinely interested in, and then return without unlocking, or continue.

1034         /*1035          * Check The events coming with the callback. At this stage, not1036          * Every device reports the events in the "key" parameter of the1037          * callback. We need to is able to handle both cases here, hence the1038          * test for "key"! = NULL before the event match test.1039< c5/>*/1040         if (Key &&!) ( (unsigned long) key & epi->event.events) 1041                 Goto Out_unlock;

If the Ready list rdllist is not accessed by another process at this time, the current file descriptor is added directly to the Rdllist list, otherwise added to the ovflist linked list. The ovflist default value is Ep_unactive_ptr,epoll_wait () to traverse Rdllist before the ovflist is set to null, and then reverts back to Ep_unactive_ptr, so by judging whether the value of Ovflist is EP _unactive_ptr know if Rdllist is being accessed at this time.

1049 if (unlikely (ep->ovflist! = ep_unactive_ptr)) {1050 IF (Epi->next = = ep_unactive_ptr)                         {1051 Epi->next = ep->ovflist;1052 Ep->ovflist = epi;1053 if (EPI->WS) {1054/*1055 * Activate                                  Ep->ws since EPI->WS May get1056 * deactivated at any time.1057                 */1058 __pm_stay_awake (EP->WS); 1059}1060 1061 }1062 goto out_unlock;1063}1064 1065/* If This file was already in the read Y list We exit soon */1066 if (!ep_is_linked (&epi->rdllink)) {1067 List_add_tail (&epi-& Gt;rdllink, &ep->rdllist); 1068 Ep_pm_stay_awake_rcu (EPI); 1069} 

If the descriptor is added to the ovflist linked list, it means that there is already ep_wait () ready to return, so there is no need to wake up the wait queue for the Epoll instance, so 1062 rows jump directly to the unlock; otherwise, wake up call Epoll_wait () While waiting for the process on the Epoll instance to wait on the queue (only one process will wake up here):

1075 if (waitqueue_active (&AMP;EP-&GT;WQ)) {1076 if (Epi->event.events & epollexclusive) & amp;&1077! ((unsigned long) key & Pollfree)) {1078 switch (unsigned long) key & Epollinout_bits) {1079 case Pollin: if (epi->event.events & Pollin) 1081 EW                                 ake = 1;1082 break;1083 case pollout:1084                                 if (Epi->event.events & pollout) 1085 Ewake = 1;1086                                 break;1087 case 0:1088 Ewake = 1;1089 break;1090}1091}1092 wake_up_locked (&ep -&GT;WQ); 1093}

If the poll queue of the Epoll instance is not empty, the process that waits on the poll queue is also woken up, but only after the unlock is made.

1094         if (waitqueue_active (&ep->poll_wait)) 1095                 pwake++;

Finally unlocks and returns:

1097 out_unlock:1098         Spin_unlock_irqrestore (&ep->lock, flags); 1099 1100/* We have a to call this         outside The lock */1101         if (pwake) 1102                 ep_poll_safewake (&ep->poll_wait); 1103 1104         if (epi-> Event.events & epollexclusive) 1105                 return ewake;1106 1107         return 1;

Note that the return value of Ep_poll_callback () is related to the epollexclusive flag, which is used to handle this situation when a different epoll instance in multiple processes is monitoring the same file descriptor, if an event occurs on that file descriptor, All Epoll instances will be awakened, which may result in a "surprise cluster" (Thundering herd). About Epollexclusive can be seen here.

Linux kernel Note: Epoll implementation principle (II)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.