A blog post about the Epoll model

Source: Internet
Author: User
Tags epoll

Previously read this article about Epoll, and now turned out to see a bit, a long time not to see the knowledge is easy to forget Ah, although the project is not used Epoll but there is time to make a simple version of their own, so as to remember the prison.

Original source: Http://blog.163.com/[email protected]/blog/static/73483745201181824629285/

A new way to improve network I/O performance in the Linux 2.6 kernel-epoll I/O multiplexing technology is used in more than one TCP network server, that is, more use of the Select function.

1. Why select lags behind
First, in the Linux kernel, the fd_set used by select is limited, that is, the kernel has a parameter __fd_setsize defines the number of handles for each fd_set, in the 2.6.15-25-386 kernel I use, the value is 1024, and the search kernel source code gets:
Include/linux/posix_types.h:
#define __FD_SETSIZE 1024
That is, if you want to simultaneously detect the readable state of 1025 handles, it is not possible to use SELECT. It is also impossible to detect the writable state of 1025 handles at the same time. Second, the implementation of select in the kernel is a polling method, that is, each detection iterates through the handles in all fd_set, and it is clear that the Select function execution time is proportional to the number of handles in fd_set, which means that the more handles the select has to detect, the longer it takes. Of course, in the previous article I did not mention the poll method, in fact, with select friends must have tried poll, I personally feel that select and poll the same, personal preference to use select only.

2. New method of improving I/O performance in the kernel Epoll
What is Epoll? According to the man manual, it is an improved poll for handling large batches of handles. Only these three system calls are required to use Epoll: Epoll_create (2), Epoll_ctl (2), epoll_wait (2).
Of course, this is not the 2.6 kernel, it is introduced in the 2.5.44 kernel (epoll (4) is a new API introduced in Linux kernel 2.5.44)

Linux2.6 Kernel Epoll Introduction
Introduce 2 books "The Linux Networking architecture--design and implementation of the Network protocols in the Linux Kernel" to explain the 2.4 kernel Linu x TCP/IP implementation, pretty good. As a real-world implementation, there are times when you have to make a lot of tradeoffs, and it's more practical to refer to a tried and tested system. For example, the SK_BUFF structure in the Linux kernel sacrifices some memory for speed and security, so when sending TCP packets, no matter how large the application layer data is, the Sk_buff has a minimum of 272 bytes. In fact, for the socket Application layer program, another book, "UNIX Network Programming Volume 1 is a little more meaningful. 2003 years later, this book is the latest version of the 3rd edition, but the main revision is the 2nd version. The 6th chapter "I/O multiplexing" is the most important. Stevens gives the basic model of network IO. The most important thing here is the Select model and the asynchronous I/O model. In theory, Aio seems to be the most efficient, your IO operation can be returned immediately, and then wait for the OS to tell you that the IO operation is complete. But all along, there is no perfect plan for how to achieve it. The most famous Windows completion port implementation of AIO, in fact, is also the internal use of thread pool implementation, the final result is IO has a thread pool, your application also need a thread pool ... Many of the documents have actually pointed to the cost of threading Context-switch that this has brought. On the Linux platform, about the network AIO has been the most changed places, 2.4 years there is a lot of aio kernel patch, the most famous should be the SGI that. But until the 2.6 kernel is released, the AIO of the network module has not entered the stable kernel version (most of them are using the user threading simulation method, which is essentially the same as the Windows completion port on Linux using NPTL). The 2.6 kernel supports AIO specifically for the disk AIO---support Io_submit (), io_getevents () and direct IO support (bypassing the VFS system buffer directly to write the hard disk, which is very helpful for the streaming server in memory smoothness).
So, the rest of the Select model is basically our only choice on Linux, in fact, if you add a no-block socket configuration, you can complete a "pseudo" AIO implementation, but the impetus lies in you rather than the OS. However, the traditional select/poll function has some unbearable shortcomings, so the improvement is always the task of 2.4-2.5 developing version kernel, including/dev/poll,realtime signal and so on. Eventually, Davide Libenzi developed epoll into the 2.6 kernel as a formal solution

3, the advantages of Epoll
<1> supports a process to open a large number of socket descriptors (FD)
Select the most unbearable is a process opened by the FD is a certain limit, set by Fd_setsize, the default value is 2048. It is obviously too small for the number of connected IM servers that need to be supported. At this time you can choose to modify the macro and then recompile the kernel, but the data also pointed out that this will bring down the network efficiency, the second is the choice of multi-process solution (traditional Apache scheme), but although the cost of the creation process of Linux is relatively small, but still can not be ignored, Coupled with inter-process data synchronization is far less efficient than synchronization between threads, so it is not a perfect solution. However, Epoll does not have this restriction, it supports the FD limit is the maximum number of open files, this number is generally far greater than 2048, for example, in 1GB memory of the machine about 100,000, the specific number can be cat/proc/sys/fs/file-max to see, In general, this number is very much related to system memory.

<2>io efficiency does not decrease linearly with the increase of FD number
Another Achilles heel of traditional select/poll is when you have a large socket set, but because of network latency, only some of the sockets are "active" at any one time, but select/poll each call will scan the entire collection linearly. resulting in a linear decrease in efficiency. However, Epoll does not have this problem, it only operates on "active" sockets---This is because Epoll is implemented in the kernel implementation based on the callback function above each FD. Then, only the "active" socket will be active to call the callback function, the other idle state socket will not, at this point, Epoll implemented a "pseudo" AIO, because this time the driving force in the OS kernel. In some benchmark, if all sockets are basically active---such as a high-speed LAN environment, Epoll is no more efficient than select/poll, and conversely, if you use epoll_ctl too much, there is a slight decrease in efficiency. But once you use the idle connections to simulate a WAN environment, epoll is far more efficient than select/poll.

<3> uses MMAP to accelerate message delivery between the kernel and user space.
This actually involves the concrete implementation of the Epoll. Both Select,poll and epoll need the kernel to inform the user of the FD message, how to avoid unnecessary memory copy is very important, at this point, epoll through the kernel in the user space mmap the same piece of memory implementation. And if you want me to follow epoll from the 2.5 kernel, you will never forget to mmap this step manually.

<4> Kernel Trimming
This is not really a epoll advantage, but the advantages of the entire Linux platform. Maybe you can doubt the Linux platform, but you can't avoid the Linux platform that gives you the ability to fine-tune the kernel. For example, the kernel TCP/IP stack uses a memory pool to manage the sk_buff structure, so the size of this memory pool (Skb_head_pool) can be dynamically adjusted at runtime---via Echo xxxx>/proc/sys/net/core/hot_ List_length completed. Another example is the 2nd parameter of the Listen function (TCP completes the packet queue length of 3 handshake), or it can be adjusted dynamically depending on the size of your platform memory. Even try the latest NAPI NIC driver architecture on a special system that has a large number of packet polygons but also has a small size for each packet itself.

4, Epoll mode of operation
Happily, the epoll of the 2.6 kernel is much simpler than the/dev/epoll of its 2.5 development versions, so most of the cases, the powerful things are often simple. The only trouble is that Epoll has 2 ways of working: LT and ET.
The LT (level triggered) is the default mode of operation and supports both block and No-block sockets. In this practice, the kernel tells you whether a file descriptor is ready, and then you can perform IO operations on the Ready FD. If you do not do anything, the kernel will continue to notify you, so this mode of programming error is less likely. Traditional Select/poll are the representatives of this model.
ET (edge-triggered) is a high-speed mode of operation and supports only no-block sockets. In this mode, when the descriptor is never ready to be ready, the kernel tells you through Epoll. It then assumes that you know that the file descriptor is ready, and no more ready notifications will be sent for that file descriptor until you do something that causes the file descriptor to no longer be ready (for example, when you send, receive, or receive requests, or send less than a certain amount of data). A ewouldblock error). Note, however, that the kernel does not send more notifications (only once) if the FD is not being used as an IO operation (which causes it to become not ready again), but in the TCP protocol, the Acceleration utility of the ET mode still requires more benchmark acknowledgement.
Epoll only epoll_create,epoll_ctl,epoll_wait 3 system calls, please refer to http://www.xmailserver.org/linux-patches/for specific usage. Nio-improve.html, in http://www.kegel.com/rn/also has a complete example, we will see how to use the
Leader/follower mode thread pool implementation, and Epoll mates.

5. How to use Epoll
Start by create_epoll (int Maxfds) to create a epoll handle, where Maxfds is the maximum number of handles that you epoll support. This function returns a new Epoll handle, and all subsequent operations are handled by this handle. When you're done, remember to close the created Epoll handle with close (). Then in your main loop of the network, each frame calls epoll_wait (int epfd, epoll_event events, int max events, int timeout) to query all the network interfaces, to see which one can be read and which one can be written. The basic syntax is:
Nfds = epoll_wait (KDPFD, events, maxevents,-1);
Where KDPFD is the handle created with Epoll_create, events is a epoll_event* pointer, and when epoll_wait this function succeeds, all read and write events are stored in epoll_events. Max_events is the number of socket handles currently required to be monitored. The last timeout is a timeout of epoll_wait, which is 0 when the return is immediately, and 1 is the time to wait until there is an event range, for any positive integer to represent such a long time, if there is no event, then the range. In general, if the network main loop is a separate thread, you can use-one, and so on, this can guarantee some efficiency, if it is the same thread as the main logic, you can guarantee the efficiency of the main loop with zero.



The Epoll model is mainly responsible for processing the requests of a large number of concurrent users, and accomplishing the data interaction between the server and the client. The specific implementation steps are as follows:
(a) Use the Epoll_create () function to create a file description that sets the maximum number of socket descriptors that will be manageable.
(b) To create a receive thread associated with Epoll, the application can create multiple receive threads to handle read notification events on Epoll, and the number of threads depends on the specific needs of the program.
(c) Create a listen socket descriptor Listensock, set the descriptor to non-blocking mode, call the Listen () function to listen on the socket for any new connection requests, set the event type Epollin to be processed in the epoll_event structure, and work as Epoll_et to increase productivity while using EPOLL_CTL () to register events and finally start the network monitoring thread.
(d) The network monitoring thread starts the loop and epoll_wait () waits for the Epoll event to occur.
(e) If the Epoll event indicates that there is a new connection request, call the Accept () function, add the user socket descriptor to the Epoll_data union, set the descriptor as non-blocking, and set the event type to be processed in the epoll_event structure to read and write, The way of working is epoll_et.
(f) If the Epoll event indicates that data is readable on the socket descriptor, the socket descriptor is added to the readable queue, the receiving thread is notified to read the data, and the received data is placed into the linked list of the received data, and after being logically processed, the feedback packet is placed in the Sending Data link table. Waits to be sent by the sending thread.

The processing flow is as follows:

C + + language:Epoll How to use 1The//epoll_wait range should be followed by a loop, all the events of interest:
(n = 0; n < Nfds; ++n)
03 {
if (EVENTS[N].DATA.FD = = listener)
05 {//If the event is a primary socket, it means that a new connection has entered and the new connection is processed.
Client = Accept (listener, (struct sockaddr *) &local, &addrlen);
if (Client < 0)
08 {
Perror ("accept");
Ten continue;
11}
setnonblocking (client); To put a new connection in non-blocking mode
ev.events = Epollin | Epollet; The new connection is also added to the Epoll's listening queue.
14//Note, here's the parameter Epollin | Epollet does not set the listener to write the socket,
15//If there is a write operation, this time Epoll will not return the event,
16//If you want to listen to the write operation, it should be Epollin | Epollout | Epollet
EV.DATA.FD = client;
if (Epoll_ctl (KDPFD, Epoll_ctl_add, client, &ev) < 0)
19 {
20/*
21 after setting the event, add the new event via EPOLL_CTL to the Epoll listener queue,
22 here with Epoll_ctl_add to add a new Epoll event, through Epoll_ctl_del to reduce a epoll event, pass
23 over Epoll_ctl_mod to change the way an event is monitored.
24 */
fprintf (stderr, "Epoll set insertion error:fd=%d", client);
return-1;
27}
28}
else//If it is not the event of the main socket, it represents the event of a user socket,
DO_USE_FD (EVENTS[N].DATA.FD); To deal with the user socket, such as read (FD,XXX), or some other processing.
31}


Yes, Epoll's operation is as simple as a total of 4 api:epoll_create, Epoll_ctl, epoll_wait and close.
If you are not familiar with the efficiency of epoll, please refer to my previous articles on network programming for online games.

Previously the company's servers are using HTTP connections, but in this case, the mobile phone in the current network situation is not only slow, but also unstable. So everyone agreed to connect using a socket. Although after using the socket, the cost to the user may increase (because it is used cmnet rather than cmwap), but the principle of user experience-oriented, I believe that everyone can still accept (hope that the players will receive the bill at the end of the month can not be able to maintain restraint ...).
This time the server design, the most important breakthrough, is the use of the Epoll model, although it is also smattering, but since the big PC online games have been so harsh test, I believe he will not let us down, after the use of the results, it is indeed very good performance. Here, I will mainly introduce the structure of this model.
6, Linux under Epoll programming example
Epoll model seems to have only one format, so as long as you refer to my code below, you can understand Epoll, the code is explained in the comments:

C + + language:codee#11763while (TRUE)
02 {
Nfds int = epoll_wait (m_epoll_fd, m_events, max_events,epoll_time_out); Waiting for epoll time to happen, equivalent to listening,
04//For the relevant port, it needs to be bound when initializing the epoll.
if (Nfds <= 0)
Continue;
m_bontimechecking = FALSE;
G_curtime = time (NULL);
(int i = 0; i < Nfds; i++)
10 {
One-try
12 {
if (m_events[i].data.fd = = m_listen_http_fd)//If the newly monitored HTTP user is connected to a bound HTTP port,
14//Establish a new connection. Since we've adopted a new socket connection, it's basically useless.
15 {
Onaccepthttpepoll ();
17}
if (m_events[i].data.fd = = m_listen_sock_fd)//if the newly monitored socket user is connected to the bound socket port,
19//Establish a new connection.
20 {
Onacceptsockepoll ();
22}
(M_events[i].events & Epollin)//If the user is already connected and receives the data, read in.
24 {
Onreadepoll (i);
26}
27
Onwriteepoll (i); See if the current active connection has data that needs to be written out.
29}
The catch (int)
31 {
PRINTF ("Catch catch error \ n");
Continue;
34}
35}
m_bontimechecking = TRUE;
Panax Notoginseng OnTimer (); To do some timed operations, mainly to remove some short-term users and so on.
38}

In fact, the essence of the epoll, that is, the above few short code, it seems that the era is really different, how to accept a large number of user connection problems, now is so easy to fix, really let people have to sigh, to which.

A blog post about the Epoll model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.