Epoll Edge Trigger Learning

Last Update:2018-07-26 Source: Internet

Author: User

Tags epoll int size readable

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In Linux network programming, it is a long time to use Select to do event triggering. In the new Linux kernel, there is a mechanism to replace it, which is epoll.
The biggest advantage over Select,epoll is that it does not reduce efficiency as the number of listening FD increases. Because in the select implementation in the kernel, it is handled by polling, the more the number of polled FD, the more natural time consuming. Also, the LINUX/POSIX_TYPES.H header file has such a statement:
#define __FD_SETSIZE 1024
Indicates that the select has up to 1024 FD at the same time, of course, you can enlarge the number by modifying the header file and then recompiling the kernel, but this does not seem to be a permanent fix.

The Epoll interface is very simple, with a total of three functions:
1. int epoll_create (int size);
Creates a epoll handle, which is used to tell the kernel how large the number of listeners is. This parameter differs from the first parameter in select () and gives the value of the maximum listener fd+1. Note that when you create a good epoll handle, it will occupy an FD value, under Linux if you look at the/proc/process id/fd/, you can see this fd, so after using Epoll, you must call Close () closed, otherwise it may cause FD to be depleted.

2. int epoll_ctl (int epfd, int op, int fd, struct epoll_event *event);
The Epoll event registration function, which differs from select () is to tell the kernel what type of event to listen to when listening to an event, but to register the type of event to listen for first. The first parameter is the return value of Epoll_create (), and the second parameter represents the action, which is represented by three macros:
Epoll_ctl_add: Register the new FD to EPFD;
Epoll_ctl_mod: Modify the Listening event of the registered FD;
Epoll_ctl_del: Deletes an FD from the EPFD;
The third parameter is the FD that needs to be monitored, and the fourth parameter tells the kernel what to listen to, and the struct epoll_event structure is as follows:
struct Epoll_event {
__uint32_t events; * Epoll Events * *
epoll_data_t data; /* USER Data variable * *
};

Events can be a collection of several macros:
Epollin: Indicates that the corresponding file descriptor can be read (including closing the socket properly);
Epollout: Indicates that the corresponding file descriptor can be written;
Epollpri: Indicates that the corresponding file descriptor has an urgent data readable (here should indicate the arrival of Out-of-band data);
Epollerr: Indicates that the corresponding file descriptor has an error;
Epollhup: Indicates that the corresponding file descriptor is hung;
Epollet: Sets the Epoll as the Edge trigger (edge triggered) mode, which is relative to the level triggered.
Epolloneshot: Listen to only one event, when listening to the event, if you still need to continue to listen to this socket, you need to add this socket to the Epoll queue

3. int epoll_wait (int epfd, struct epoll_event * events, int maxevents, int timeout);
Wait for the event to occur, similar to the Select () call. Parameter events are used to get the collection of events from the kernel, maxevents the kernel of this event, the Maxevents value cannot be greater than the size when the Epoll_create () is created, and the parameter timeout is the timeout (milliseconds, 0 will return immediately ,-1 will be uncertain, and there are claims that it is permanently blocked. The function returns the number of events that need to be handled, such as returning 0 to indicate that the timeout has expired.

--------------------------------------------------------------------------------------------

From the Man Handbook, the specific description of ET and LT is as follows

There are two models of the Epoll event:
Edge triggered (ET)
Level triggered (LT)

If there is such an example:
1. We have added a file handle (RFD) that is used to read data from the pipeline to the Epoll descriptor
2. This time from the other end of the pipeline was written 2KB data
3. Call Epoll_wait (2), and it will return RFD, indicating that it is ready for the read operation
4. Then we read the 1KB data.
5. Call Epoll_wait (2) ...

Edge triggered working mode:
If we use the Epollet flag when we add RFD to the Epoll descriptor in step 1th, then it is possible to suspend after the 5th call to Epoll_wait (2) because the remaining data still exists in the file's input buffer. And the data sender is still waiting for a feedback message for the data that has been sent. Only when an event occurs on a monitored file handle does the ET work mode report the event. Therefore, in step 5th, the caller may discard the remaining data that is still in the file input buffer. In the example above, an event is generated on the RFD handle because a write is performed in step 2nd, and then the event is destroyed in step 3rd. Because the read operation for step 4th does not read the data in the input buffer of the file, we are not sure whether to suspend after the 5th call Epoll_wait (2) is complete. Epoll when working in ET mode, you must use a Non-blocking socket interface to avoid starvation of the task of handling multiple file descriptors due to blocking read/blocking write operations on a file handle. It is best to invoke the Epoll interface of the ET mode in the following manner, which will be followed by the avoidance of possible defects.
I based on non-blocking file handles
II only need to hang when read (2) or write (2) returns to Eagain. But this is not to say that every read () requires a circular reading, until it is read to produce a eagain that the event processing is complete, when read () returns a data length that is less than the requested length of data, it is possible to determine that the data is not already in the buffer and that the read event has been processed.

Level triggered working mode
Conversely, when the Epoll interface is invoked by LT, it is equivalent to a faster poll (2), and they have the same function, regardless of whether the subsequent data is used. Because even with the epoll of ET mode, multiple events are generated when multiple chunk data is received. The caller can set the EPOLLONESHOT flag and epoll_wait (2) after receiving the event Epoll the file handle associated with the event is forbidden from the Epoll descriptor. Therefore, when the epolloneshot is set, using the Epoll_ctl (2) with the EPOLL_CTL_MOD flag to process the file handle becomes what the caller must do.

Then explain in detail ET, LT:

LT (level triggered) is the default mode of operation and supports both block and No-block sockets. In this practice, the kernel tells you whether a file descriptor is ready, and then you can io the ready fd. If you don't do anything, the kernel will continue to notify you, so this pattern is less likely to be programmed incorrectly. Traditional Select/poll are representative of this model.

ET (edge-triggered) is a high-speed mode of operation that supports only no-block sockets. In this mode, when the descriptor is never ready to be ready, the kernel tells you through Epoll. Then it assumes you know that the file descriptor is ready and no more ready notifications are sent for that file descriptor until you do something that causes that file descriptor to be no longer in the ready state (for example, if you're sending, receiving, or receiving requests, Or a ewouldblock error is caused by sending less than a certain amount of data received. Note, however, that the kernel will not send more notifications (only once) if this FD is not being used for IO operations (which would cause it to become not ready again), but in the TCP protocol, the Acceleration utility of the ET mode still requires more benchmark confirmation (this sentence is not understood).

In many tests we will see that without a lot of idle-connection or dead-connection,epoll the efficiency is not much higher than select/poll, but when we encounter a lot of idle-connection ( For example, there is a large number of slow connections in WAN environments, and it is found that epoll is significantly more efficient than select/poll. (not tested)

In addition, when using the Epoll et model to work, when a Epollin event is generated,
When reading data, it is important to consider that when the size of the recv () is equal to the size of the request, it is likely that the buffer and the data have not been read, and that the event has not been processed yet, so it needs to be read again:
while (RS)
{
Buflen = recv (activeevents. DATA.FD, buf, sizeof (BUF), 0);
if (Buflen < 0)
{
Due to non-blocking mode, the current buffer has no data readable when errno is Eagain
This is where the event has been handled.
if (errno = = Eagain)
Break
Else
Return
}
else if (Buflen = 0)
{
This indicates that the socket on the side is closed properly.
}
if (Buflen = = sizeof (BUF) rs = 1; Need to read again
Else
rs = 0;
}

Also, if the transmitter traffic is greater than the receiving end of the flow (meaning that the program read Epoll is faster than the forward socket), because the non-blocking socket, then send () function, although returned, but the actual buffer data is not really sent to the receiver, so constantly read and hair, When the buffer is full, a eagain error is generated (refer to the man send), and the data sent by this request is ignored. Therefore, a function that encapsulates the socket_send () is used to handle this situation, and the function will try to write the data back and return 1 to indicate an error. Inside the Socket_send (), when the write buffer is full (send () returns 1, and errno is Eagain), it waits and then retries. This is not perfect, and in theory it may be a long time blocking in the Socket_send (), but there is no better way.

ssize_t socket_send (int sockfd, const char* buffer, size_t buflen)
{
ssize_t tmp;
size_t total = Buflen;
const char *p = buffer;

while (1)
{
TMP = Send (SOCKFD, p, total, 0);
if (TMP < 0)
{
When send receives a signal, it can continue to write, but return-1 here.
if (errno = = eintr)
return-1;

When the socket is non-blocking, this error is returned, indicating that the write buffer queue is full.
Make a delay here and try again.
if (errno = = Eagain)
{
Usleep (1000);
Continue
}

return-1;
}

if ((size_t) tmp = total)
return buflen;

Total-= tmp;
p = tmp;
}

return TMP;
}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More