In Linux network programming, select is used for event triggering for a long time. In the new Linux kernel, there is a mechanism to replace it, that is, epoll. Compared with select, epoll does not reduce the efficiency as the number of FD listeners increases. In the select implementation in the kernel, it uses polling for processing. The larger the number of FD polling, the more time it takes. In addition, the Linux/posix_types.h header file has the following statement:
# DEFINE _ fd_setsize 1024
It indicates that select can listen to a maximum of 1024 FD at the same time. Of course, you can modify the header file and re-compile the kernel to expand this number, but this does not seem to be a permanent cure.
The epoll interface is very simple. There are three functions in total:
1. Int epoll_create (INT size );
Create an epoll handle,Size indicates the total number of kernel listeners..This parameter is different from the first parameter in select () and returns the value of FD + 1 for the maximum listener.. Note that after the epoll handle is created, it occupies an FD value. In Linux, If you view/proc/process ID/FD /, you can see this FD,Therefore, close () must be called after epoll is used up. Otherwise, FD may be exhausted.
2. Int epoll_ctl (INT epfd, int
OP, Int FD, struct epoll_event * event );
The epoll event registration function differs from the select () function in that it tells the kernel what type of event to listen to when listening to the event. Instead, it registers the event type to be listened.
The first parameter is the returned value of epoll_create ().,
The second parameter represents an action, represented by three macros.:
Epoll_ctl_add: register a new FD to epfd;
Epoll_ctl_mod: modifies the listener events of the registered FD;
Epoll_ctl_del: delete an FD from epfd;
Third ParameterIs the FD to be monitored,
The fourth parameter is to tell the kernel what event to listen., Struct epoll_event structure is as follows:
Struct epoll_event {
_ Uint32_t events;/* epoll events */
Epoll_data_t data;/* User Data variable */
};
Events can be a collection of the following macros::
Epollin: indicates that the corresponding file descriptor can be read (including the normal shutdown of the Peer socket );
Epollout: indicates that the corresponding file descriptor can be written;
Epollpri: indicates that the corresponding file descriptor has an urgent readable data (Here it should indicate that out-of-band data has arrived );
Epollerr: indicates that the corresponding file descriptor is incorrect;
Epollhup: indicates that the corresponding file descriptor is hung up;
Epollet: Set epoll to edge triggered mode, which is relative to level triggered.
Epolloneshot: only listens for an event once. After listening for this event, if you want to continue listening for this socket, you need to add this socket to the epoll queue again.
3. Int epoll_wait (INT epfd, struct epoll_event * events, int maxevents, int timeout );
Wait for event generation, similar to the select () call.The events parameter is used to obtain the event set from the kernel.,Maxevents indicates the kernel size of this events. The value of this maxevents cannot be greater than the size when epoll_create () is created.The timeout parameter is the timeout time (in milliseconds, 0 will be returned immediately,-1 will be uncertain, or it is said to be permanent blocking ). This function returns the number of events to be processed. If 0 is returned, the Operation has timed out.
Bytes --------------------------------------------------------------------------------------------
The specific descriptions of ET and Lt are as follows:
There are two types of epoll events:
Edge triggered (ET )//High-speed working mode with high error rate. Only no_block socket is supported.
Level triggered (LT )//Default mode of work, that is, the default mode of work, supports block socket and no_block socket, the error rate is relatively small.
If there is such an example: (LT method, that is, by default, the kernel will continue to notify you that you can read data, et mode, internal notifications, and data can be read)
1. We have added a file handle (RFD) used to read data from the pipeline to the epoll descriptor.
2. At this time, 2 kb of data is written from the other end of the pipeline.
3. Call epoll_wait (2) and it will return RfD, indicating that it is ready to read
4. Then we read 1 kb of data.
5. Call epoll_wait (2 )......
Edge triggered working mode:
If the epollet flag is used when RfD is added to the epoll descriptor in step 1, it may be suspended after epoll_wait (2) is called in step 2, because the remaining data still exists in the input buffer of the file, and the data sending end is still waiting for a feedback message for the sent data. Only when an event occurs on the monitored file handle does the et work mode report the event. Therefore, in step 1, the caller may give up waiting for the remaining data still in the file input buffer. In the above example, an event is generated on the RfD handle, because a write operation is executed in step 1, and the event will be destroyed in step 2. Because the read operation in step 1 does not read the data in the input buffer of an empty file, we call
After epoll_wait (2) is completed, it is uncertain whether it will be suspended. When epoll works in the et mode, it is necessary to use a non-blocking set interface to avoid starving the task of processing multiple file descriptors due to the blocking read/blocking write operation of a file handle. It is best to call the epoll interface in et mode in the following ways to avoid possible defects. (LT method can solve this defect)
I based on non-blocking file handle
II it must be suspended only when read (2) or write (2) returns eagain. However, this does not mean that every read () operation needs to be performed cyclically until an eagain is generated. When read () when the returned data length is less than the requested data length, you can determine that there is no data in the buffer at this time, so that the read event has been processed.
Level triggered working mode (default working mode)
On the contrary, when the epoll interface is called in the LT method, it is equivalent to a fast poll (2), and they have the same function regardless of whether the subsequent data is used. Because even if you use epoll in the et mode, multiple events are still generated when you receive data from multiple chunks. The caller can set the epolloneshot flag. After the epoll_wait (2) receives the event, epoll will disable the file handle associated with the event from the epoll descriptor. Therefore, when epolloneshot is set, using epoll_ctl (2) with epoll_ctl_mod mark to process the file handle becomes a required task for the caller.
Then explain et in detail, LT:
// If I/O operations are not performed on the ready FD, the kernel will continuously notify you.
LT (Level triggered) is the default working method, and supports both block and no-block socket. In this way, the kernel tells you whether a file descriptor is ready, and then you can perform Io operations on this ready FD. If you do not perform any operation, the kernel will continue to inform you, so the possibility of programming errors in this mode is lower. The traditional select/poll model is representative of this model.
// The kernel will not be notified if I/O operations are not performed on the ready FD.
Et (edge-triggered) is a high-speed operating method that only supports no-block socket. In this mode, when the descriptor is never ready, the kernel tells you through epoll. Then it will assume that you know that the file descriptor is ready and will not send more ready notifications for that file descriptor until you have done some operations to make the file descriptor no longer ready (For example, when you send, receive, or receive requests, or send and receive less than a certain amount of data, an ewouldblock
Error ). However, please note that if I/O operations are not performed on this FD all the time (as a result, it becomes not ready again), the kernel will not send more notifications (only once)But in TCP, the acceleration effect of et mode still needs more benchmark validation (this sentence is not understandable ).
In addition, when epoll's et model is used to work, when an epollin event is generated,
When reading data, you must consider that if the size returned by Recv () is equal to the request size, it is likely that there is still data in the buffer zone that has not been read, which means that the event has not been processed, therefore, you need to read it again:
While (RS) // et Model
{
Buflen = Recv (activeevents [I]. Data. FD, Buf, sizeof (BUF), 0 );
If (buflen <0)
{
/* Because of the non-blocking mode, when errno is eagain, it indicates it is interrupted.
(You can continue reading, or wait for subsequent notifications from epoll or select) When errno is ewouldblock | eagain, it indicates that the current buffer has no data readable.
(You can continue reading, or wait for subsequent notifications from epoll or select )*/
If (errno = eagain | errno = Eint | errno = ewouldblock ){/*That is, when buflen> 0 and errno = eagain | errno = ewouldblock | errno = Eint,Indicates that no data is available. (Read/write is like this)*/Method 1,
Usleep (1000 );
Continue; // This will block the read process here and continue reading until there is data. Method 2: if you cannot read the data temporarily, you will not be able to read the data temporarily. (When data arrives, epoll and select will be triggered and processed again) break; // blocking is not allowed here. Wait for subsequent notifications to continue reading.}
Else
Return; // actually failed. Close the link.
}
Else if (buflen = 0)
{
/* This indicates that the socket on the peer end is normally closed, and close (activeevents [I]. data. FD), and delete it from select. If it is epoll, the system will delete the FD listener by itself. */
}
If (buflen = sizeof (BUF)
Rs = 1; // It needs to be read again, but the data in the previous Buf is full and needs to be processed.
Else
Rs = 0; // no need to read again
}
Important ::
In addition, if the sending end traffic is greater than the receiving end traffic (that is, epoll's program reads faster than the forwarded socket), even if the send () function returns, however, the data in the actual buffer zone is not actually sent to the receiving end. In this way, the eagain error is generated when the buffer zone is full (refer to man send), and the data sent by this request is ignored. therefore, the socket_send () function needs to be encapsulated to handle this situation. This function will try to write the data and then return it. "-1" indicates an error. Within socket_send (), when the write buffer is full (send () returns-1 and errno is eagain), the system will wait and try again. this method is not perfect. Theoretically, it may be blocked within socket_send () for a long time, but there is no better way. this method is similar to readn and writen encapsulation (I have written it myself and I have also introduced it in advanced programming for UNIX environments)
Ssize_t socket_send (INT sockfd, const char * buffer, size_t buflen)
{
Ssize_t TMP;
Size_t Total = buflen;
Const char * P = buffer;
While (1)
{
TMP = Send (sockfd, P, total, 0 );
If (TMP <0)
{
/* Eintr indicates that when sending, the interruption signal is received. You can continue writing, or wait for the following epoll and select notifications before writing. Eagain indicates that when sending, the write Buffer Queue is full. We can continue to wait, continue to write, or wait for subsequent notifications from epoll and select, and then write. */
If (errno = eintr | errno = eagain | errno = ewouldblock)
{Method 1,
Usleep (1000 );
Continue;
// Continue writing (this will cause the current process to block sending here) method 2, break; // wait for the subsequent notification of select and epoll, and then write.
}
Return-1;
}
If (size_t) TMP = total)
Return buflen;
Total-= TMP;
P + = TMP;
}
Return TMP;