Linux Epoll Summary

Source: Internet
Author: User
Tags epoll

What isEpoll

What is Epoll? According to the man manual, it is an improved poll for handling large batches of handles . Of course, this is not the 2.6 kernel, it is introduced in the 2.5.44 kernel (Epoll (4) is a new API introduced in Linux kernel 2.5.44), it has almost all the advantages of the previous said, A multi-channel I/O readiness notification method that is considered to be the best performance under Linux2.6.

Epoll related system calls

Epoll only epoll_create,epoll_ctl,epoll_wait 3 system calls.

1. int epoll_create (int size);

Creates a handle to a epoll. Since linux2.6.8, thesize parameter has been ignored. It should be noted that when the Epoll handle is created, it will occupy an fd value, under Linux if the view /proc/process id/fd/, is able to see this fd, so after the use of Epoll, Close () must be called off, otherwise it may cause FD to be exhausted.

2. int epoll_ctl (int epfd, int op, int fd, struct epoll_event *event);

The Event Registration function of epoll, unlike Select (), is to tell the kernel what type of event to listen to when it listens to events, but to register the type of event to listen on first.

The first parameter is the return value of Epoll_create ().

The second parameter represents an action, represented by three macros:

Epoll_ctl_add: Register the new fd to epfd;

Epoll_ctl_mod: Modify the monitoring events of the registered FD;

Epoll_ctl_del: Delete a fd from the EPFD ;

The third parameter is the FD that needs to be monitored .

The fourth parameter is to tell the kernel what to listen for,struct epoll_event structure as follows:

typedef Union EPOLL_DATA {      void *ptr;       int FD;      __uint32_t u32;      __uint64_t u64;  } epoll_data_t;    // events of interest and events  that are triggered struct epoll_event {      /* *      /  * */   };

Events can be a collection of several macros:

Epollin: Indicates that the corresponding file descriptor can be read (including a graceful shutdown of the peer socket);

Epollout: Indicates that the corresponding file descriptor can be written;

Epollpri: Indicates that the corresponding file descriptor has an urgent data readable (this should indicate the arrival of out-of-band data);

Epollerr: Indicates an error occurred in the corresponding file descriptor;

Epollhup: Indicates that the corresponding file descriptor is hung up;

Epollet: set Epoll to edge triggered mode, which is relative to the horizontal trigger (level triggered).

Epolloneshot: Listen to only one event, when the event is monitored, if you still need to continue to listen to the socket, you need to add the socket to the epoll queue again

3. int epoll_wait (int epfd, struct epoll_event * events, int maxevents, int timeout);

Collects events that have been sent in epoll monitored events. The parameter events is an array of allocated epoll_event structures, and Epoll will assign the event to the events array (events cannot be null pointers, the kernel is only responsible for copying the data into the events array, Not going to help us allocate memory in the user state). maxevents The kernel of this events how large, this maxevents value can not be greater than the size of the creation of Epoll_create () , the parameter timeout is the time-out (in milliseconds, 0 returns immediately,1 is permanently blocked). If the function call succeeds, returns the number of file descriptors that have been prepared on the I/O, such as returning 0 to indicate a timeout.

2 ways to work with Epoll -level trigger (LT) and Edge trigger (ET)

Horizontal trigger (LT): The default mode of operation, if a descriptor is ready, the kernel will notify the processing, if not processed, the next time the kernel will still notify

Edge Trigger (ET): Only non-blocking descriptors are supported. The program is required to ensure that the data in the buffer is read or all written out (in the ET mode, the descriptor's readiness is not notified again), so a non-blocking descriptor needs to be sent.

For read operations, if read does not read the data in buffer at a time, then the next time a read-ready notification is not available, causing the data already in buffer to be read out, unless new data arrives again . For write operations, the main reason is that FD is usually non-blocking caused by the ET mode-- how to ensure that the data written by the user is written out.

Epoll compared to the advantages of Select/poll:

1. Support a process to open a large number of socket descriptors (FD)

Select the most unbearable is a process opened by the FD is a certain limit, set by Fd_setsize, the default value is 2048. It is obviously too small for the number of connected IM servers that need to be supported . At this time you can choose to modify the macro and then recompile the kernel, but the data also pointed out that this will lead to a decline in network efficiency, the second is the choice of multi-process solution (traditional Apache scheme ), but although The cost of creating a process on Linux is relatively small, but it is still not negligible, and data synchronization between processes is far less efficient than synchronization between threads, so it is not a perfect solution. However, Epoll does not have this restriction, it supports the FD limit is the maximum number of open files, this number is generally much larger than 2048, for example , in 1GB memory of the machine about about 100,000, the exact number can be Cat/proc/sys/fs/file-max , in general, this number and system memory relationship is very large.

2.IO efficiency does not decrease linearly with increasing number of FD

of traditionalSelect/poll Another deadly weakness is when you have a bigSocket collection, but due to network latency, only part of any timeSocket is"Active", butSelect/poll each invocation linearly scans the entire set, resulting in a linear decrease in efficiency. ButEpoll There is no such problem, it will only"Active"Ofsocket operation ---This is because in the kernel implementation epoll is based on each callback function. Then, only "active " socket will be active to call   callback function, other idle status socket No, at this point, epoll implements a "pseudo-" AIO, because this time the impetus in os kernel. In some  benchmark, if all the socket are basically active lan environment, epoll is no more efficient than select/poll, conversely, if the idle connections analog wan environment ,epoll efficiency is far select/poll.

3. Use mmap to accelerate message delivery between the kernel and user space

both Select,poll and Epoll need the kernel to inform the user of the FD message, how to avoid unnecessary memory copy is very important, at this point,Epoll is through the kernel in the user space Mmap is implemented with the same piece of memory. And if you want me to follow epoll from the 2.5 kernel , you will never forget to mmap this step manually.

epoll mechanism

When a process calls the Epoll_create method, the Linux kernel creates a eventpoll structure that has two members that are closely related to how epoll is used. The eventpoll structure is as follows:

struct eventpoll{...      .       /* The root node of the red-black tree, which stores all the events that need to be monitored to be added to the Epoll */      struct rb_root  rbr;       /* The double-linked list holds the event that satisfies the condition that will be returned to the user via epoll_wait */      struct List_head rdlist;      ....  };  

Each Epoll object has a separate eventpoll structure that holds events that are added to the Epoll object through the Epoll_ctl method. These events are mounted in a red-black tree, so that repeated events can be efficiently identified by the red-black tree (the insertion time efficiency of the red-black tree is LGN, where n is the height of the tree).

All events added to Epoll will have a callback relationship with the device (network card) driver, which means that the callback method is called when the corresponding event occurs. This callback method is called Ep_poll_callback in the kernel, which adds events that occur to the Rdlist doubly linked list.

In Epoll, for each event, a EPITEM structure is created, as follows:

 struct   epitem{ struct  rb_node rbn; //     red black tree node  struct  list_head Rdllink; //     Doubly linked list node   struct  epoll_filefd FfD; //     event handle information     struct  eventpoll *ep; //     struct  epoll_event event ; //  expected event type } 

When calling epoll_wait to check if an event occurs, you only need to check for epitem elements in the Rdlist doubly linked list in the Eventpoll object. If Rdlist is not empty, the events that occur are copied to the user state, and the number of events is returned to the user.




Linux Epoll Summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.