Linux epoll event model details

Source: Internet
Author: User

Because it will reuse the file descriptor set to deliver results without forcing developers to re-Prepare the file descriptor set to be listened before each wait event. Another reason is that when obtaining the event, it does not need to traverse the entire listener descriptor set, as long as it traverses the descriptor set that is asynchronously awakened by kernel IO events and is added to the Ready queue.

 

The epoll interface is very simple. There are three functions in total:
1. int epoll_create (int size );
Create an epoll handle and the size is used to tell the kernel the total number of the listeners. This parameter is different from the first parameter in select () and returns the value of fd + 1 for the maximum listener. Note that after the epoll handle is created, it occupies an fd value. In linux, If you view/proc/process id/fd /, can see This fd, so.

2. int epoll_ctl (int epfd, int op, int fd, struct epoll_event * event );
Epoll event registration function, that is, register the event type to listen.
The first parameter is the returned value of epoll_create,
The second parameter represents an action, represented by three macros:
EPOLL_CTL_ADD: register a new fd to epfd;
EPOLL_CTL_MOD: modifies the listener events of the registered fd;
EPOLL_CTL_DEL: delete an fd from epfd;
The third parameter is the fd to be monitored,
The fourth parameter tells the kernel what to listen for. The structure of struct epoll_event is as follows:

 *

EPOLLIN: indicates that the corresponding file descriptor can be read (including the normal shutdown of the Peer SOCKET );
EPOLLOUT: indicates that the corresponding file descriptor can be written;
EPOLLPRI: indicates that the corresponding file descriptor has an urgent readable data (Here it should indicate that out-of-band data has arrived );
EPOLLERR: indicates that the corresponding file descriptor is incorrect;
EPOLLHUP: indicates that the corresponding file descriptor is hung up;
EPOLLET: Set EPOLL to Edge Triggered mode (), which is relative to Level Triggered.
EPOLLONESHOT: only listens for an event once. After listening for this event, if you want to continue listening for this socket, you need to add this socket to the EPOLL queue again.

3. int epoll_wait (int epfd, struct epoll_event * events, int maxevents, int timeout );
Wait for the event to be generated. The events parameter is used to get the event set from the kernel. maxevents tells us how big the kernel events is. The value of maxevents cannot be greater than the size when epoll_create () is created. The timeout parameter is the timeout time (in milliseconds, 0 will return immediately,-1 will be uncertain, and it is also said to be permanently blocked )., If 0 is returned, it indicates that the request has timed out.

4. There are two types of EPOLL events:

Edge Triggered (ET) Edge trigger is Triggered only when data arrives, regardless of whether there is data in the cache.
Level Triggered (LT) Level Trigger is Triggered whenever data exists.

For example:
1. We have added a file handle (RFD) used to read data from the pipeline to the epoll descriptor.
2. At this time, 2 kb of data is written from the other end of the pipeline.
3. Call epoll_wait (2) and it will return RFD, indicating that it is ready to read
4. Then we read 1 kb of data.
5. Call epoll_wait (2 )......
Edge Triggered working mode:
If the EPOLLET flag is used when RFD is added to the epoll descriptor in step 1, it may be suspended after epoll_wait (2) is called in step 2, because the remaining data still exists in the input buffer of the file, and the data sending end is still waiting for a feedback message for the sent data. Only when an event occurs on the monitored file handle does the ET work mode report the event. Therefore, in step 1, the caller may give up waiting for the remaining data still in the file input buffer. In the above example, an event is generated on the RFD handle, because a write operation is executed in step 1, and the event will be destroyed in step 2. Because the read operation in step 1 does not read the data in the input buffer of an empty file, we are not sure whether to suspend after calling epoll_wait (2) in step 2. When epoll works in the ET mode, it is necessary to use a non-blocking set interface to avoid starving the task of processing multiple file descriptors due to the blocking read/blocking write operation of a file handle. It is best to call the epoll interface in ET mode in the following ways to avoid possible defects.

I based on non-blocking file handle
Ii it must be suspended only when read (2) or write (2) returns EAGAIN. However, this does not mean that every read () operation needs to be performed cyclically until an EAGAIN is generated. When read () when the returned data length is less than the requested data length, you can determine that there is no data in the buffer at this time, so that the read event has been processed. Level Triggered works in the opposite mode. when calling the epoll interface in the LT method, it is equivalent to a fast poll (2), regardless of whether the subsequent data is used or not, therefore, they have the same functions. Because even if you use epoll in the et mode, multiple events are still generated when you receive data from multiple chunks. The caller can set the EPOLLONESHOT flag. After the epoll_wait (2) receives the event, epoll will disable the file handle associated with the event from the epoll descriptor. Therefore, when EPOLLONESHOT is set, using epoll_ctl (2) with EPOLL_CTL_MOD mark to process the file handle becomes a required task for the caller.

Then explain ET in detail, LT:
LT (level triggered) is, and supports both block and no-blocksocket. in this way, the kernel tells you whether a file descriptor is ready, and then you can perform IO operations on this ready fd. If you do not perform any operation, the kernel will continue to inform you, so the possibility of programming errors in this mode is lower. The traditional select/poll model is representative of this model.

ET (edge-triggered) is a high-speed working method ,. In this mode, when the descriptor is never ready, the kernel tells you through epoll. Then it will assume that you know that the file descriptor is ready and will not send more ready notifications for that file descriptor, until you do some operations, the file descriptor is no longer ready (for example, you are sending, receiving, or receiving requests, or an EWOULDBLOCK error occurs when the number of data sent and received is less than a certain amount ). However, please note that if I/O operations are not performed on this fd all the time, the kernel will not send more notifications (only once), but in the TCP protocol, more benchmark validation is needed for the acceleration utility of the ET mode (this sentence is not understandable ).
In many tests, we can see that if there is not a large number of idle-connection or deadconnection, epoll will not be much more efficient than select/poll, however, when we encounter a large number of idleconnections (such as a large number of slow connections in the WAN environment), we will find that epoll is much more efficient than select/poll. (Not tested)
In addition, when the epoll ET model is used to work, when an EPOLLIN event is generated, when reading data, you must consider that if the size returned by recv () is equal to the request size, it is very likely that there is still data in the buffer that has not been read, which means that the event has not been processed, so you need to read it again:

= recv(activeevents[i].data.fd, buf, (buf), (buflen <     (errno == (buflen == (buflen == = ;   } = 

In addition, if the sending end traffic is greater than the receiving end traffic (that is, epoll's program reads faster than the forwarded socket), even if the send () function returns, however, the data in the actual buffer zone is not actually sent to the receiving end. In this way, the EAGAIN error is generated when the buffer zone is full (refer to man send), and the data sent by this request is ignored. therefore, the socket_send () function needs to be encapsulated to handle this situation. This function will try to write the data and then return it. "-1" indicates an error. Within socket_send (), when the write buffer is full (send () returns-1 and errno is EAGAIN), the system will wait and try again. this method is not perfect. Theoretically, it may be blocked within socket_send () for a long time, but there is no better way.

ssize_t socket_send( sockfd,  *=  *p =(= send(sockfd, p, total, (tmp <       (errno == -      (errno == -((size_t)tmp ==-=+=

G ++ myepoll. cpp lxx_net.cc-g-o myepoll

 

<sys/socket.h><sys/epoll.h><netinet/.h><arpa/inet.h><fcntl.h><unistd.h><stdio.h><errno.h><iostream>  MAX_EPOLL_SIZE 500 MAX_CLIENT_SIZE 500 MAX_IP_LEN      16 MAX_CLIENT_BUFF_LEN 1024 QUEUE_LEN 500 BUFF_LEN 1024 fd_epoll = - fd_listen = -typedef  fd;                              host[MAX_IP_LEN];              port;                            len;                             buff[MAX_CLIENT_BUFF_LEN];       status;                     *ptr_cli = epoll_add( fd_epoll,  fd,  epoll_event * (fd_epoll <  || fd <  || ev == - (epoll_ctl(fd_epoll, EPOLL_CTL_ADD, fd, ev) <  -  epoll_del( fd_epoll,  (fd_epoll <  || fd <  - (epoll_ctl(fd_epoll, EPOLL_CTL_DEL, fd, &ev_del) <  -  do_read_data( (idx >== ((n = recv(ptr_cli[idx].fd, ptr_cli[idx].buff+pos, MAX_CLIENT_BUFF_LEN-pos, ))) {     fprintf(stdout, +,   (n > ) {     ptr_cli[idx].len +=  (errno != EAGAIN) {      fprintf(stdout, =  =  conn_fd = lxx_net_accept(fd_listen, ( sockaddr *)&cliaddr, & (conn_fd >=  (lxx_net_set_socket(conn_fd, ) !=  i =  flag =      (i = ; i < MAX_CLIENT_SIZE; i++ (!== == =  (flag) {== i |  (epoll_add(fd_epoll, conn_fd, &ev) < =  main( argc,  ** port = (argc == = atoi(argv[ ((fd_listen = lxx_net_listen(port, QUEUE_LEN)) <  -= (fd_epoll <  -  == (epoll_add(fd_epoll, fd_listen, &ev) < = -= - -=  nfds = epoll_wait(fd_epoll, events, MAX_EPOLL_SIZE,  (nfds <  err = (err != ( i = ; i < nfds; i++ (events[i].data.u32 &         do_read_data(events[i].data.u32 &  (events[i].data.fd == 

 

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.