Some details about the development of network servers on Linux: poll and epoll

Last Update:2017-08-02 Source: Internet

Author: User

Tags epoll

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Some details about the development of network servers on Linux: poll and epoll-Linux general technology-Linux programming and kernel information. The following is a detailed description. With the fully supported epoll by the 2.6 kernel, many articles and sample code on the network provide the following information: replacing the traditional poll with epoll can improve the performance of network service applications. However, there are few reasons for performance improvement in most articles. Here I will try to analyze how poll and epoll work in the kernel (2.6.21.1) code, then, compare the results with some test data.

POLL:

Poll, poll, or select are familiar to most Unix/Linux programmers. These two things are similar in principle, and there is no obvious performance difference, but select has a limit on the number of file descriptors monitored, so here we use poll for instructions.

Poll is a system call. Its kernel entry function is sys_poll. sys_poll calls do_sys_poll directly without any processing. The execution process of do_sys_poll can be divided into three parts:

1. Copy the input pollfd array to the kernel space. Because the copy operation is related to the array length, this is an O (n) operation, in do_sys_poll, the code in this step includes the part starting from the function to calling do_poll.

2. query the status of the device corresponding to each file descriptor. If the device is not ready, add an item to the device's waiting queue and continue querying the status of the next device. If no device is ready after all the devices are queried, the current process needs to be suspended until the device is ready or times out. The pending operation is performed by calling schedule_timeout. After the device is ready, the process is notified to continue running. Then, all devices are traversed again to find the ready device. This step traverses all devices twice, and the time complexity is O (n), which does not include the waiting time. The related code is in the do_poll function.

3. Transmit the obtained data to the user space and perform the following operations, such as releasing the memory and detaching the waiting queue, the time complexity of operations such as copying data to a user space and detaching a waiting queue is also O (n). The specific code includes the part that ends after do_poll is called in the do_sys_poll function.

EPOLL:

Next, we will analyze epoll. Unlike poll/select, epoll is no longer a separate system call, but composed of three system calls: epoll_create/epoll_ctl/epoll_wait, we will see the benefits of doing so later.

Let's take a look at sys_epoll_create (kernel function corresponding to epoll_create). This function is mainly used for some preparation work, such as creating a data structure, initialize the data and finally return a file descriptor (indicating the newly created virtual epoll file). This operation can be considered as a fixed-time operation.

Epoll is implemented as a virtual file system, which has at least two advantages:

1. Some information can be maintained in the kernel, which is maintained among multiple epoll_wait instances, such as all monitored file descriptors.

2. epoll itself can also be poll/epoll;

The implementation of the specific epoll Virtual File System has nothing to do with performance analysis.

In sys_epoll_create, we can also see that the size of the epoll_create parameter is meaningless at this stage, as long as it is greater than zero.

Next is sys_epoll_ctl (kernel function corresponding to epoll_ctl). It must be clear that only one file descriptor is processed each time sys_epoll_ctl is called. The execution process when op is EPOLL_CTL_ADD is described here, sys_epoll_ctl performs some security checks and then enters ep_insert. In ep_insert, ep_poll_callback is used as the return function to join the waiting queue of the device (assuming that the device is not ready yet). Since each time poll_ctl operates only one file descriptor, therefore, it can also be considered as an O (1) operation.

The ep_poll_callback function is critical. When the waiting device is ready, it is returned by the system and two operations are performed:

1. Add the ready devices to the ready queue. This step avoids poll for all devices after the device is ready as poll, reducing the time complexity by O (n) to O (1 );

2. Wake up the virtual epoll file;

The last step is sys_epoll_wait. The ep_poll function is actually executed here. This function waits for the process itself to be inserted into the virtual epoll file waiting queue until it is awakened (see the ep_poll_callback function description above), and finally runs ep_events_transfer to copy the result to the user space. Since only the information of the ready device is copied, the copy operation here is an O (1) operation.

Another concern is epoll's processing of EPOLLET, that is, edge-triggered processing. A rough view of the code is to hand over the work done by the kernel in the horizontal triggering mode to the user for processing, intuitively, it will not have a big impact on performance. If you are interested, you are welcome to discuss it.

POLL/EPOLL comparison:

On the surface, the poll process can be viewed as composed of one epoll_create/multiple epoll_ctl/One epoll_wait/one close and other system calls, in fact, epoll divides poll into several parts for implementation because poll is used in server software (such as Web servers ):

1. A large number of file descriptors need to be added to poll simultaneously;

2. After each poll is completed, the ready file descriptor only occupies a very small part of all the poll descriptors.

3. The changes to the file descriptor array (ufds) by multiple poll calls are small;

The traditional poll function is equivalent to restarting every call. It reads ufds from the user space and then completely copies the function to the user space, in addition, each poll requires at least one addition and deletion of all devices to wait for the queue, which is the cause of inefficiency.

Epoll considers all of the above situations in detail. You do not need to read the output ufds completely every time. You only need to use epoll_ctl to adjust a small part of it. You do not need to execute the Add/delete wait queue operation every time epoll_wait, in addition, the improved mechanism makes it unnecessary to search for the entire device array after a device is ready, which improves efficiency. In addition, the most obvious point is that, in terms of user usage, using epoll does not have to poll all returned results every time and find the ready part. O (n) is changed to O (1 ), the performance is also improved a lot.

In addition, it is also found that changing epoll_ctl to one can process multiple fd (like semctl) at a time will improve the performance? This is especially true when system calls are time-consuming. However, the time consumption of system calls will be analyzed later.

Comparison of POLL/EPOLL test data:

Test environment: I wrote three pieces of code to simulate servers, active clients, Dead clients, and servers running on a self-compiled 2.6.11 kernel system. The hardware is PIII933, the two clients run on another PC. These two PCs have better performance than the hardware of the server, mainly to ensure that the server is easily loaded, and the three servers are connected using a m switch.

The server accepts and poll all connections. If a request arrives, it replies with a response and continues poll.

The Active Client simulates a number of concurrent Active connections, which continuously send requests to receive replies.

Zombie simulates some clients that only connect but do not send requests. The purpose is to occupy the server's poll descriptor resources.

Test process: maintain 10 concurrent connections, constantly adjust the number of concurrent connections, and record the performance difference between poll and epoll in different proportions. Number of dead concurrent connections: 80,160,320,640,128.

The horizontal axis represents the ratio of dead concurrent connections to active concurrent connections, and the vertical axis represents the time spent in completing 40000 request replies, in seconds. The red line indicates the poll data, and the green line indicates the epoll data. It can be seen that when the number of file descriptors monitored increases, the time consumption of poll increases linearly, while epoll maintains a stable state, which is almost unaffected by the number of descriptors.

When all the clients monitored are active, poll is slightly more efficient than epoll (mainly near the origin, that is, when the zombie concurrent connection is 0, it is not easy to see on the figure ), epoll implementation is more complex than poll, and monitoring a small number of descriptors is not its strength.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More