Epoll_create & epoll_ctl & epoll_wait kernel implementation-kernel 3.0.8

Source: Internet
Author: User
Tags epoll
1. Related Data Structure
# Define epollin 0x00000001 # define epollpri 0x00000002 # define epollout 0x00000004 # define epollerr 0x00000008 # define epollhup 0x00000010 # define epollrdnorm 0x00000040 # define epollrdband 0X00000080 # define epollwrnorm 0x00000100 # define epollwrband 0x00000200 # define epollmsg 0x00000400 # define epollet 0x80000000 # define epoll_ctl_add 1 # define limit 2 # define epoll_ctl_mod 3 typedef Union epoll_data {void * PTR; int FD; unsigned int u32; unsigned long u64;} struct; struct epoll_event {unsigned int events; // such as epollin, epollout epoll_data_t data ;}; int epoll_create (INT size ); int epoll_ctl (INT epfd, int op, int FD, struct epoll_event * event); // OP: epoll_ctl_add, epoll_ctl_delint epoll_wait (INT epfd, struct epoll_event * events, int Max, int timeout );
Common event types:
Epollin: indicates that the corresponding file descriptor can be read;
Epollout: indicates that the corresponding file descriptor can be written;
Epollpri: indicates that the corresponding file descriptor has urgent readable data.
Epollerr: indicates that the corresponding file descriptor is incorrect;
Epollhup: indicates that the corresponding file descriptor is hung up;
Epollet: indicates that an event occurs in the corresponding file descriptor;
2. epoll_create

For the implementation of bionic, see epoll_create.s, which calls the system directly. For the system call table, see kernel: src/include/Linux/syscils. h. Based on the search string "epoll_create" in the kernel according to the rules, you can find the corresponding implementation function: syscall_define1 (epoll_create, Int, size) <the implementation in the kernel file eventpoll. c medium>

SYSCALL_DEFINE1(epoll_create, int, size){if (size <= 0)return -EINVAL;return sys_epoll_create1(0);}

Take a closer look, as long as the input parameter is greater than 0, it is useless. This function provides the following functions:

1) Find an idle FD (file handle) from the files of the current process)

2) create a struct file instance (fops is eventpoll_fops, and priv is the newly created struct eventpoll object)

3) files> FDT> FD [FD] of the current process are newly created struct file instances.

4) Of course, a FD (file handle) is returned to the user State)

 

3. epoll_ctl

User State: int epoll_ctl (INT epfd, int op, int FD, struct epoll_event * event );

Kernel state: syscall_define4 (epoll_ctl, Int, epfd, Int, op, Int, FD, struct epoll_event _ User *, event), which implements the control interface of the eventpoll file, used to insert, delete, and modify file descriptors in a file set. The code processing process is as follows: <epoll_event is used to describe events of interest and source FD>

1) Get the file instance corresponding to the eventpoll file handle epfd (struct file)

2) Get the file instance corresponding to the FD of the target file handle (struct file)

3) ensure that the file instance (struct file) corresponding to the target file handle FD supports the poll operation (that is, (tfile-> f_op & tfile-> f_op-> poll ))

4) convert the private data of the eventpoll file to an eventpoll object. The key of the eventpoll red/black tree is epoll_filefd.

Struct epoll_filefd {
Struct file * file;
Int FD;
};

5) Search for the target FD and fie instances in the red/black tree to obtain a struct epitem, the data structure of the nodes in the red/black tree.

6) perform insert, remove, or modify operations based on Op. The following describes insert (when the operation is epoll_ctl_add)

7) Call int ep_insert (struct eventpoll * EP, struct epoll_event * event, struct file * tfile, int FD)

7.1) create a struct epitem object (each file descriptor must have an epitem object added to the eventpoll interface, and this object is inserted into the red/black tree of eventpoll)

7.2) initialize the three linked lists in the EPI and save eventpoll, target FD, target file instance, epoll_event ..

7.3) register the callback function to the f_op-> poll of the target file. The related code is as follows:

Struct ep_pqueue EPQ;
EPQ. EPI = EPI;
Init_poll_funcptr (& epq.pt, ep_ptable_queue_proc );
Revents = tfile-> f_op-> poll (tfile, & epq.pt); // For details, refer to the pipe_poll processing method. It finally calls the ep_ptable_queue_proc function for processing.

Ep_ptable_queue_proc: is used to add our wait queue to the target file wakeup lists
7.4) insert this object into the red/black tree

/* * This is the callback that is used to add our wait queue to the * target file wakeup lists. */static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead, poll_table *pt){struct epitem *epi = ep_item_from_epqueue(pt);struct eppoll_entry *pwq;if (epi->nwait >= 0 && (pwq = kmem_cache_alloc(pwq_cache, GFP_KERNEL))) {init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);pwq->whead = whead;pwq->base = epi;add_wait_queue(whead, &pwq->wait);list_add_tail(&pwq->llink, &epi->pwqlist);epi->nwait++;} else {/* We have to signal that an error occurred */epi->nwait = -1;}}

 

/* * This is the callback that is passed to the wait queue wakeup * mechanism. It is called by the target file descriptors when they * have events to report. */static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *key){   ...}

How is ep_poll_callback executed?

The following uses pipe as an example. Assume that the above function is used to detect the data read from pipe. When writing data, this function will be called.

init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);static inline void init_waitqueue_func_entry(wait_queue_t *q,wait_queue_func_t func){q->flags = 0;q->private = NULL;q->func = func;}

Ep_poll_callback is stored in Q-> func.

The following describes the calling process of Q-> FUNC:

static ssize_tpipe_write(struct kiocb *iocb, const struct iovec *_iov,    unsigned long nr_segs, loff_t ppos){         ...if (do_wakeup) {wake_up_interruptible_sync_poll(&pipe->wait, POLLIN | POLLRDNORM);kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);do_wakeup = 0;}        ...}#define wake_up_interruptible_sync_poll(x, m)\__wake_up_sync_key((x), TASK_INTERRUPTIBLE, 1, (void *) (m))/** * __wake_up_sync_key - wake up threads blocked on a waitqueue. */void __wake_up_sync_key(wait_queue_head_t *q, unsigned int mode,int nr_exclusive, void *key){        ...__wake_up_common(q, mode, nr_exclusive, wake_flags, key);        ...}/* * The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just * wake everything up. If it's an exclusive wakeup (nr_exclusive == small +ve * number) then we wake all the non-exclusive tasks and one exclusive task. * * There are circumstances in which we can try to wake a task which has already * started to run but is not in state TASK_RUNNING. try_to_wake_up() returns * zero in this (rare) case, and we handle it by continuing to scan the queue. */static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,int nr_exclusive, int wake_flags, void *key){wait_queue_t *curr, *next;list_for_each_entry_safe(curr, next, &q->task_list, task_list) {unsigned flags = curr->flags;if (curr->func(curr, mode, wake_flags, key) &&(flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)break;}}

To sum up, the target file uses pipefd as an example. Generate an epitem containing pipe FD and the event to be monitored for the new target file, and bind the epitem with the address of the ep_ptable_queue_proc function into an ep_pqueue structure, then, use the function address field in the structure as the parameter to execute the poll function (pipe_poll) corresponding to pipe FD. When pipe_poll is executed, the ep_ptable_queue_proc function is executed, in addition, the epitem pointer can be obtained by calculating the offset based on the input function address. The ep_ptable_queue_proc binds the ep_poll_callback function of the epoll callback function with the epitem pointer into another eppoll_entry structure, generate a wait_queue_t for the function address in eppoll_entry and insert it to the target pipe.
In the wait queue of FD, when the pipe triggers wait_queue activation due to state change, <call wake_up_interruptible_sync_poll (& pipe-> wait, Pollin | pollrdnorm) in pipe_write ); it sends the wake up threads blocked on the waitqueue (pipe-> wait)>, and the ep_poll_callback function included in the queue will be called, and according to its function address parameters, the offset is used to obtain the epitem. The callback function will execute the pipe_poll function during the call to determine whether the specified event is followed. If it is set to true
Insert epitem into rdlist in eventpoll, activate the wait process on epoll FD, and return the event to the user State. This allows you to monitor the event of the target FD.

 

4. epoll_wait

Read epoll_event from epfd and save it to the events array.

User State: int epoll_wait (INT epfd, struct epoll_event * events, int Max, int timeout );

Kernel state: syscall_define4 (epoll_wait, Int, epfd, struct epoll_event _ User *, events, Int, maxevents, Int, timeout)

1) obtain the file instance corresponding to epfd

2) convert the private data of the eventpoll file to the eventpoll object

3) Call int ep_poll (struct eventpoll * EP, struct epoll_event _ User * events,
Int maxevents, long timeout) to get epoll_event. This function obtains the prepared events and saves them to the events buffer provided by the call.

3.1) Call hrtimer in ep_poll to implement its timeout Function

3.2) Call int ep_events_available (struct eventpoll * EP) to check whether an event exists.

3.3) Call int ep_send_events (struct eventpoll * EP, struct epoll_event _ User * events, int maxevents) to obtain the event and copy it to the events buffer of the user space.

3.3.1) Call ep_scan_ready_list (Ep, ep_send_events_proc, & esed );

3.3.2) in the callback function, obtain the data sources of epoll_event and epoll_event fields as follows:

Uevent is the epoll_event provided by the user,

Revents = epi-> FfD. File-> f_op-> poll (epi-> FfD. File, null) & epi-> event. events;

If (revents ){
If (_ put_user (revents, & uevent-> events) |
_ Put_user (epi-> event. Data, & uevent-> data )){
List_add (& epi-> rdllink, head );
Return eventcnt? Eventcnt:-efault;
}

...

}

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.