Man epoll
The epoll event distribution interface is able to behave both as edge
Triggered (ET) and level triggered (LT). The difference between et
And Lt event distribution mechanic can be described as follows. Sup-
Pose that this scenario happens:
1 The file descriptor that represents the read side of a pipe (
RFD) is added inside the epoll device.
2 pipe writer writes 2kb of data on the write side of the pipe.
3 A Call to epoll_wait (2) is done that will return RfD as ready
File descriptor.
4 The Pipe reader reads 1kb of data from RFD.
5 A Call to epoll_wait (2) is done.
If the RfD file descriptor has been added to the epoll interface using
The epollet flag, the call to epoll_wait (2) done in step 5 will proba-
Bly hang because of the available data still present in the file input
Buffers and the remote peer might be expecting a response based on
Data it already sent. The reason for this is that edge triggered event
Distribution delivers events only when events happens on the monitored
File. So, in step 5 the caller might end up waiting for some data that
Is already present inside the input buffer. In the above example,
Event on RfD will be generated because of the write done in 2 and
Event is consumed in 3. Since the read operation done in 4 does not
Consume the whole buffer data, the call to epoll_wait (2) done in step 5
Might lock indefinitely. The epoll interface, when used with the ePol-
Let flag (edge triggered) shocould use non-blocking file descriptors
Avoid having a blocking read or write starve the task that is handling
Multiple file descriptors. The suggested way to use epoll as an edge
Triggered (epollet) interface is below, and possible pitfalls to avoid
Follow.
I with non-blocking file descriptors
II by going to wait for an event only after Read (2) or
Write (2) return eagain
On the contrary, when used as a level triggered interface, epoll is
All means a faster poll (2), and can be used wherever the latter is used
Since it shares the same semantics. Since even with the edge triggered
Epoll multiple events can be generated up on receept of multiple chunks
Of data, the caller has the option to specify the epolloneshot flag,
Tell epoll to disable the associated file descriptor after the receept
Of an event with epoll_wait (2). When the epolloneshot flag is speci-
Fied, It is caller responsibility to rearm the file descriptor using
Epoll_ctl (2) with epoll_ctl_mod.
The epoll event distribution system can run in two modes:
Edge triggered (ET)
Level triggered (LT)
The following describes the differences between the ET and Lt event distribution mechanisms. Assume an environment:
1. We have added a file handle (RFD) used to read data from the pipeline to the epoll descriptor.
2. At this time, 2 kb of data is written from the other end of the pipeline.
3. Call epoll_wait (2) and it will return RfD, indicating that it is ready to read
4. Then we read 1 kb of data.
5. Call epoll_wait (2 )......
Edge triggered working mode:
If the epollet flag is used when RfD is added to the epoll descriptor in step 1, it may be suspended after epoll_wait (2) is called in step 2, because the remaining data still exists in the input buffer of the file, and the data sending end is still waiting for a feedback message for the sent data. Only when an event occurs on the monitored file handle does the et work mode report the event. Therefore, in step 1, the caller may give up waiting for the remaining data still in the file input buffer. In the above example, an event is generated on the RfD handle, because a write operation is executed in step 1, and the event will be destroyed in step 2. Because the read operation in step 1 does not read the data in the input buffer of an empty file, we are not sure whether to suspend after calling epoll_wait (2) in step 2. When epoll works in the et mode, it is necessary to use a non-blocking set interface to avoid starving the task of processing multiple file descriptors due to the blocking read/blocking write operation of a file handle. It is best to call the epoll interface in et mode in the following ways to avoid possible defects.
I based on non-blocking file handle
II it must be suspended only when read (2) or write (2) returns eagain.
Level triggered Working Mode
On the contrary, when the epoll interface is called in the LT method, it is equivalent to a fast poll (2), and they have the same function regardless of whether the subsequent data is used. Because even if you use epoll in the et mode, multiple events are still generated when you receive data from multiple chunks. The caller can set the epolloneshot flag. After the epoll_wait (2) receives the event, epoll will disable the file handle associated with the event from the epoll descriptor. Therefore, when epolloneshot is set, using epoll_ctl (2) with epoll_ctl_mod mark to process the file handle becomes a required task for the caller.
//////////////////////////////////////// /// // I am a legend in the split line ////////////////////////////////////// //////////////////////////////////////// ////
The following two sections describe epoll from the Internet:
LT (Level triggered) is the default working method, and supports both block and no-block socket. in this way, the kernel tells you whether a file descriptor is ready, and then you can perform Io operations on this ready FD. If you do not perform any operation, the kernel will continue to inform you, so the possibility of programming errors in this mode is lower. The traditional select/poll model is representative of this model.
Et (edge-triggered) is a high-speed operating method that only supports no-block socket. In this mode, when the descriptor is never ready, the kernel tells you through epoll. Then it will assume that you know that the file descriptor is ready and will not send more ready notifications for that file descriptor, until you do some operations, the file descriptor is no longer ready (for example, you are sending, receiving, or receiving requests, or an ewouldblock error occurs when the number of data sent and received is less than a certain amount ). However, please note that if I/O operations are not performed on this FD all the time, the kernel will not send more notifications (only once), but in the TCP protocol, more benchmark validation is needed for the acceleration utility of the et mode (this sentence is not understandable ).
In lt mode, epoll_wait adds the file with the event to the rdllist again, so that epoll_wait can be checked again next time.
If (epi-> event. Events & epolloneshot)
Epi-> event. Events & = ep_private_bits;
Else if (! (Epi-> event. Events & epollet) {// lt Mode
/*
* If this file has been added with Level
* Trigger mode, we need to insert back inside
* The ready list, so that the next call
* Epoll_wait () will check again the events
* Availability. At this point, noone can insert
* Into EP-> rdllist besides us. The epoll_ctl ()
* Callers are locked out
* Ep_scan_ready_list () holding "CTX" and
* Poll callback will queue them in EP-> ovflist.
*/
List_add_tail (& epi-> rdllink, & EP-> rdllist );
}
In lt mode, the State will not be lost, and the program can be fully driven by epoll.
In et mode, the program must first drive its own logic. In case of an eagain error, it must rely on epoll_wait to drive the program. In this case, epoll helps the program to get rid of the obstacles.
The et driver event depends on the socket's sk_sleep to wait for the queue to wake up. This can only happen when a new package arrives, and the Data Packet Causes Pollin. Ack confirms that sk_buffer destroy causes pollout, but this is not a one-to-one relationship, is one-to-one (multiple network packets generate a Pollin, pollout event ).
Et common error: the Recv reaches the proper length, and epoll_wait is triggered after the program completes processing. At this time, the program may be blocked for a long time, because there is data in the rev_buffer of the socket, or the connection is closed on the peer end, but this information is notified to you last time, you have not completed the processing, epoll_wait.
The correct method is: Recv to the appropriate length, the program processes; Recv again, if it is eagain, epoll_wait.
This issue is especially important when multiple threads call epoll_wait for one epoll at the same time. Istenfd is associated with several epollfd instances. Several epoll_wait instances are waiting ....
Summary:
Et usage rules. epoll_wait is called only when an eagain error occurs.
If an accpet call has returned, in addition to establishing the current connection, you cannot immediately epoll_wait and need to continue loop accpet until-1 is returned, and errno = eagain, sample code in TAF:
If (EV. Events & epollin)
{
Do
{
Struct sockaddr_in stsockaddr;
Socklen_t isockaddrsize = sizeof (sockaddr_in );
Tc_socket Cs;
CS. setowner (false );
// Receive connection
Tc_socket S;
S. INIT (FD, false, af_inet );
Int iretcode = S. Accept (CS, (struct sockaddr *) & stsockaddr, isockaddrsize );
If (iretcode> 0)
{
... Establish a connection
}
Else
{
// Do not continue accept until eagain occurs
If (errno = eagain)
{
Break;
}
}
} While (true );
}
Similarly, all functions such as Recv/send must go to errno = eagain.
For et, of course, it may not end only when the sending/Recv condition is reached, and you can control it at any time, that is, you can set in and out events as appropriate:
1 If you take the initiative to epoll_mod out the event, as long as the handle can send data (sending buffer is not enough), epoll_wait will respond (sometimes this mechanism is used to notify epoll_wai to wake up ).
2 If you take the initiative to epoll_mod in event, as long as the handle has data to read, epoll_wait will respond.
This logic is not required in common services and may be required in some special situations. But please note that if you go to epoll mod for each call, the efficiency will be significantly reduced! Therefore, when using the et write service framework, the simplest solution is:
When a connection is established, epoll_add In and out events are left empty. Each read/write operation ends in two cases:
1 eagain
2. The actual number of read/write bytes is smaller than the number of bytes to read/write.