Epoll_create & Epoll_ctl & epoll_wait Kernel Implementation--Kernel 3.0.8

Source: Internet
Author: User
Tags epoll int size
1. Related Data structure
#define Epollin          0x00000001
#define EPOLLPRI         0x00000002
#define Epollout         0x00000004
# Define EPOLLERR         0x00000008
#define EPOLLHUP         0x00000010
#define Epollrdnorm      0x00000040
#define Epollrdband      0x00000080
#define Epollwrnorm      0x00000100
#define Epollwrband      0x00000200
#define EPOLLMSG         0x00000400
#define Epollet          0x80000000

#define Epoll_ctl_add    1
#define Epoll_ctl_del    2
#define EPOLL_CTL_MOD    3

typedef Union EPOLL_DATA 
{
    void *ptr;
    int FD;
    unsigned int u32;
    unsigned long long u64;
} epoll_data_t;

struct epoll_event 
{
    unsigned int events;  such as Epollin, epollout
    epoll_data_t data;

int epoll_create (int size);
int epoll_ctl (int epfd, int op, int fd, struct epoll_event *event); OP: Epoll_ctl_add, Epoll_ctl_del
int epoll_wait (int epfd, struct epoll_event *events, int max, int timeout);
Common Types of events:
Epollin: Indicates that the corresponding file descriptor can be read;
Epollout: Indicates that the corresponding file descriptor can be written;
Epollpri: Indicates that the corresponding file descriptor has an urgent data readable
Epollerr: Indicates that the corresponding file descriptor has an error;
Epollhup: Indicates that the corresponding file descriptor is hung;
Epollet: Indicates the corresponding file description have event occurs;
2. Epoll_create

In the realization of bionic see: Epoll_create. S, it directly to the system call, the system call table see KERNEL:SRC/INCLUDE/LINUX/SYSCALLS.H, according to the rules in the kernel search string "Epoll_create", you can find the corresponding implementation function: Syscall_ DEFINE1 (epoll_create, int, size) < implementation in kernel in file eventpoll.c >

Syscall_define1 (epoll_create, int, size)
{
	if (size <= 0)
		return-einval;

	Return sys_epoll_create1 (0);
}

Look carefully, as long as the incoming parameter is greater than 0, it has no other use. This function functions as:

1 Find an idle FD (file handle) from the files of the current process

2 Create a struct file instance whose fops is eventpoll_fops,priv for the struct Eventpoll object that you just created

3 The FILES->FDT->FD[FD of the current process] is the newly created struct file instance

4 return to the user state of course is a FD (file handle)

3. Epoll_ctl

User state: int epoll_ctl (int epfd, int op, int fd, struct epoll_event *event);

Kernel: syscall_define4 (epoll_ctl, int, EPFD, int, op, int, fd, struct epoll_event *, event), it mainly implements the control interface of __user file For inserting, deleting, and modifying file descriptors in a file set. Its code processing process is: <epoll_event used to describe events and sources of interest fd>

1 Get eventpoll file handle EPFD corresponding file instance (struct files)

2 Get the file instance (struct file) corresponding to the target file handle FD

3) Ensure that the file instance (struct file) that corresponds to the target file handle FD supports poll operations (that is, (Tfile->f_op && tfile->f_op->poll))

4 Convert the private data of Eventpoll file to Eventpoll object, eventpoll the key of red and black tree is: epoll_filefd

struct EPOLL_FILEFD {
struct file *file;
int FD;
};

5 in the red and black tree to find the target to operate FD and FIE instances, so as to obtain a struct epitem, red-black tree node in the data structure

6 According to OP, the corresponding insert, remove or modify operation, the following say insert (when the operation is Epoll_ctl_add)

7) Call int Ep_insert (struct eventpoll *ep, struct epoll_event *event, struct file *tfile, int fd)

7.1 Create struct Epitem object (each file descriptor added to the Eventpoll interface must have a Epitem object, and this object is inserted into the eventpoll red-black tree)

7.2 Initialize EPI three list, save Eventpoll, Target FD, target file instance, epoll_event.

7.3 The callback function is registered to the f_op->poll of the target file, the relevant code is as follows:

struct Ep_pqueue epq;
Epq.epi = EPI;
Init_poll_funcptr (&epq.pt, Ep_ptable_queue_proc);
revents = Tfile->f_op->poll (Tfile, &epq.pt); For more information, refer to Pipe_poll processing, which ultimately calls the EP_PTABLE_QUEUE_PROC function to handle

Ep_ptable_queue_proc:is used to add we wait \ queue to the target file wakeup lists
7.4 Insert this object into the red and black tree

* * This is the callback which is used to add we wait
 queue to the
 * target file wakeup lists.
 *
/static void Ep_ptable_queue_proc (struct file *file, wait_queue_head_t *whead,
				 poll_table *pt)
{
	struct Epitem *epi = Ep_item_from_epqueue (PT);
	struct Eppoll_entry *pwq;

	if (epi->nwait >= 0 && (pwq = Kmem_cache_alloc (Pwq_cache, Gfp_kernel))) {
		Init_waitqueue_func_entry (& Amp;pwq->wait, ep_poll_callback);
		Pwq->whead = Whead;
		Pwq->base = epi;
		Add_wait_queue (Whead, &pwq->wait);
		List_add_tail (&pwq->llink, &epi->pwqlist);
		epi->nwait++;
	} else {
		/* We have to signal the error occurred * *
		epi->nwait =-1;
	}
}

* * This is the callback which is passed to the wait
 queue wakeup
 * mechanism. It is called by the target file descriptors when they
 * have events to the.
 *
/static int ep_poll_callback (wait_queue_t *wait, unsigned mode, int sync, void *key)
{
   ...
}

How is this ep_poll_callback to be carried out?

Take pipe For example, assuming that the above is a test to read data from the pipe, which will call this function when writing data.

Init_waitqueue_func_entry (&pwq->wait, ep_poll_callback);
static inline void Init_waitqueue_func_entry (wait_queue_t *q,
					wait_queue_func_t func)
{
	q->flags = 0;
	Q->private = NULL;
	Q->func = func;
}


Ep_poll_callback is kept in Q->func.

Here's a look at this q->func call process:

Static ssize_t pipe_write (struct KIOCB *iocb, const struct IOVEC *_iov, unsigned long nr_segs, loff_t PPOs) { ... if (do_wakeup) {wake_up_interruptible_sync_poll (&pipe->wait, Pollin |
		Pollrdnorm);
		Kill_fasync (&pipe->fasync_readers, Sigio, poll_in);
	Do_wakeup = 0;

}
        ...
} #define WAKE_UP_INTERRUPTIBLE_SYNC_POLL (x, M) \ __wake_up_sync_key ((x), task_interruptible, 1, (void *) (m))/** * _
 _wake_up_sync_key-wake up threads blocked on a waitqueue. */void __wake_up_sync_key (wait_queue_head_t *q, unsigned int mode, int nr_exclusive, void *key) {... __wake_
        Up_common (q, Mode, nr_exclusive, Wake_flags, key);
...

} * * The core wakeup function. Non-exclusive wakeups (nr_exclusive = 0) just * wake everything up. If it ' s an exclusive wakeup (nr_exclusive = = Small +ve * number) Then we wake all the non-exclusive tasks and one Exclusi
 ve task. * * There are circumstances in which we-can try to wake a task which has ALREady * started to run but isn't in the state task_running.
 TRY_TO_WAKE_UP () returns * Zero in this (rare) case, and we handle it through continuing to scan the queue. 
	*/static void __wake_up_common (wait_queue_head_t *q, unsigned int mode, int nr_exclusive, int wake_flags, void *key) {

	wait_queue_t *curr, *next;

		List_for_each_entry_safe (Curr, Next, &q->task_list, task_list) {Unsigned flags = curr->flags;
			if (Curr->func (curr, mode, Wake_flags, key) && (Flags & wq_flag_exclusive) &&!--nr_exclusive)
	Break }
}



To sum up: The target file takes pipefd as an example. Generates a epitem for the new target file containing the pipe FD and the events that need to be monitored, and binds the epitem to the address of the function Ep_ptable_queue_proc to a EP_PQUEUE structure, then executes the function address field in the structure as a parameter pipe FD corresponds to the poll function (Pipe_poll), in the Pipe_poll execution time function Ep_ptable_queue_proc is executed, at the same time function body can calculate the migration according to the passing function address to obtain the Epitem pointer, function ep_ptable_ Queue_proc Bundles the Epoll callback function Ep_poll_callback function with the Epitem pointer to another structure eppoll_entry, and then generates a eppoll_entry of the function address in wait_queue_t, Insert into the wait queue in the target pipe FD, pipe invoke Wake_up_interruptible_sync_poll in Pipe_write when the activation Wait_queue is triggered by a state change (&pipe- >wait, Pollin | Pollrdnorm); It will wake up threads blocked on the Waitqueue (pipe->wait), the Ep_poll_callback function contained in the queue will be invoked, and according to its function address parameter , the offset is used to get the Epitem, and the callback function executes the Pipe_poll function when it is called to clarify whether the specified event of concern occurs, and if so, insert the Epitem into the rdlist of eventpoll and activate the process of waiting on the Epoll FD. And the event is returned to the user state. This will enable the implementation of the target FD event monitoring.

4. Epoll_wait

Reads the epoll_event from the EPFD and saves it to the events array.

User state: int epoll_wait (int epfd, struct epoll_event *events, int max, int timeout);

Kernel states: Syscall_define4 (epoll_wait, int, EPFD, struct epoll_event-__user *, events, int, maxevents, int, timeout)

1) Get EPFD corresponding file instance

2 Convert the private data of the Eventpoll file to the Eventpoll object

3) Call int ep_poll (struct eventpoll *ep struct epoll_event __user *events,
int maxevents, long timeout) gets epoll_event. This function gets the events that have been prepared, and then saves them to the event buffer provided by the invocation.

3.1) Ep_poll call Hrtimer to implement its timeout function

3.2) Call int ep_events_available (struct eventpoll *ep) to check for events

3.3) Call int ep_send_events (struct eventpoll *ep,struct epoll_event __user, int *events) to really get the event, and copy to the user space in the events buffer

3.3.1) calls Ep_scan_ready_list (EP, Ep_send_events_proc, &esed);

3.3.2) in the callback function, obtain the data source for the Epoll_event,epoll_event two domains as follows:

Uevent is a user-supplied epoll_event,

revents = Epi->ffd.file->f_op->poll (Epi->ffd.file, NULL) & epi->event.events;

if (revents) {
if (__put_user (revents, &uevent->events) | |
__put_user (Epi->event.data, &uevent->data)) {
List_add (&epi->rdllink, head);
Return eventcnt? EVENTCNT:-efault;
}

...

}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.