Linux under Socket details (10)---IO multiplexing server in Epoll mode

Source: Internet
Author: User
Tags benchmark epoll

Introduction to the Epoll model

Epoll but currently in Linux under the development of large-scale concurrent network programs popular candidates, Epoll in the Linux2.6 kernel formally introduced, and select similar, in fact, all I/O multiplexing technology, and there is no mystery.

In fact, the design of concurrent network programs under Linux, there is always no lack of methods, such as the typical Apache model (Process Per Connection, abbreviated PPC), TPC (Thread perconnection) model, As well as the Select model and the poll model, why should we introduce epoll? That's still got to be said ...

Disadvantages of common models

If you do not lay out the shortcomings of other models, how can compare the advantages of epoll?

Multi-process ppc/multi-threaded TPC model

These two models are similar in thinking, that is, let each of the incoming connections on their own work, do not bother me. Just PPC is for it to open a process, and TPC opened a thread. But don't bother me there is a price, it to time and space Ah, after more connections, so many process/thread switching, the cost comes up;

As a result, the maximum number of connections that such models can accept is not high, typically around hundreds of.

Select Model-O (n)

The model of multi-process multithreading is huge and tedious, so we have a select model

 int   Select  (int  Nfds, Fd_set *readfds, Fd_set *writefds, F D_set *exceptfds, struct  timeval *timeout); void  FD_CLR (int  fd, fd_set *set ); int  Fd_isset (int  fd, fd_set *set ); void  Fd_set (int  fd, fd_set *set ); void  Fd_zero (fd_set *set ); 

The select system call is used to let our program monitor the state of multiple file handles (Descrīptor) changes. The Select () system is used to monitor an array of multiple file descriptors, and when select () returns, the ready file descriptor in the array is modified by the kernel to allow the process to obtain these file descriptors for subsequent read and write operations.

The select system call is used to let our program monitor the state changes of multiple file descriptors. The program stops at select to wait until one or more of the monitored file descriptors has changed state.

The mechanism of Select () provides a FD_SET data structure, which is actually a long type of array, each element of an array can be linked to an open file handle, the work of establishing the connection is done by the programmer, when the Select () is called, the kernel is modified according to the IO State fd_ The contents of the set, which informs the process of executing select () which sockets or files are readable and writable.

When some descriptors can be read and written, select returns data (when no data is read or written, select returns, because select is synchronous) scans the descriptor Fd_set to query those descriptors that have data requests and to process them. Time Complexity of O (n)

Performance is much higher than those of a multi-process or multithreaded model that is blocked, but it is still not enough. Because select has a lot of restrictions

    1. maximum concurrency limit , because the FD (file descriptor) opened by a process is limited by fd_setsize settings (you can see in-depth parsing why select can only listen to 1024), the default value is 1024/ 2048, so the maximum concurrency of the Select model is limited accordingly. Users can modify the fd_setsize themselves, and then recompile, but in fact, it is not recommended to do so

      Linux under Fd_set is a 1024-bit bitmap, each bit represents a value of FD, the return of the need to scan the bitmap, which is the reason for low efficiency. Performance problems and not to mention, the correctness of the problem is more worthy of attention.

      Because this is a 1024-bit bitmap, when the FD value within the process is >= 1024, it will go out of bounds and may cause a crash. For server programs, FD >= 1024 is easy to achieve, as long as the number of connections + open files is large enough to occur.

      include/linux/posix_types.h:#define __FD_SETSIZE         1024
    2. efficiency problem , select each call will be linear scan all of the FD set, so efficiency will appear linear decline, the result of fd_setsize change is, everybody slowly, what? All over the time??!!

    3. kernel/User space memory copy problem , how to let the kernel to notify the FD message to user space? On this issue, select takes a memory copy method.

Poll model

The implementation of poll is very similar to select, except that it describes the FD collection in different ways, poll uses the POLLFD structure rather than the fd_set structure of the Select, and the others are the same.

He registers a bunch of event groups, returns when there is an event request, and then still needs to poll POLLFD to know that the corresponding file descriptor is found , and the data needs to be copied back and forth between kernel space and user space. Time Complexity of O (n)

So he only solved the problem of select 1, but the problem 2,3 still has to be solved.

Epoll model

Performance improvement of Epoll

The other models to criticize one by one, and then look at the improvement of epoll, in fact, the shortcomings of the select in turn that is epoll advantage.

    1. Epoll no maximum concurrent connection limit is the maximum number of open files, this number is generally far greater than 2048, generally speaking, this number and system memory relationship is very large, the specific number can be cat/proc/sys/fs/file-max.

    2. Efficiency, the biggest advantage of Epoll is that it is just your "active" connection, but not the total number of connections, so in the actual network environment, epoll efficiency will be much higher than the select and poll.

    3. Memory copy, Epoll uses "Shared memory" at this point, and this memory copy is omitted.

How to solve the above 3 disadvantages

Epoll since it is an improvement on select and poll, avoid the three drawbacks mentioned above. How did that epoll all work out?

Before we take a look at the different invocation interfaces of Epoll and select and poll, both Select and poll provide only a function--select or poll function.

and Epoll provides three functions, epoll_create epoll_ctl and epoll_wait ,

    • Epoll_create is to create a epoll handle;

    • Epoll_ctl is the type of event registered to be monitored;

    • Epoll_wait is waiting for the event to occur.

Supports a process to open a large number of socket descriptors (FD)

For the first disadvantage, the number of concurrent limits

Epoll does not have this restriction, it supports the FD limit is the maximum number of open files, this number is generally far greater than 2048, for example, in 1GB memory of the machine about about 100,000, the specific number can be cat/proc/sys/fs/file-max to see, In general, this number is very much related to system memory.

Select the most unbearable is a process opened by the FD is a certain limit, set by Fd_setsize, the default value is 2048. It is obviously too small for the number of connected IM servers that need to be supported. At this time you can choose to modify the macro and then recompile the kernel, but the data also pointed out that this will bring down the network efficiency, the second is the choice of multi-process solution (traditional Apache scheme), but although the cost of the creation process of Linux is relatively small, but still can not be ignored, Coupled with inter-process data synchronization is far less efficient than synchronization between threads, so it is not a perfect solution. However, Epoll does not have this restriction, it supports the FD limit is the maximum number of open files, this number is generally far greater than 2048, for example, in 1GB memory of the machine about 100,000, the specific number can be cat/proc/sys/fs/file-max to see, In general, this number is very much related to system memory.

IO efficiency does not decrease linearly with increasing number of FD

the linear complexity of the polling descriptor for the second disadvantage

Epoll's solution does not always take current as a select or poll to join the FD corresponding device waiting queue, but only at the time of epoll_ctl (this is necessary) and specify a callback function for each FD when the device is ready, This callback function is invoked when the waiting queue is woken up, and the callback function adds the ready FD to a ready list. Epoll_wait's job is actually to see if there's an F ready in this list.

Another Achilles heel of traditional select/poll is when you have a large socket set, but because of network latency, only some of the sockets are "active" at any one time, but select/poll each call will scan the entire collection linearly. resulting in a linear decrease in efficiency. But Epoll does not have this problem, it only operates on "active" sockets-this is because Epoll is implemented in the kernel implementation based on the callback function above each FD. Then, only the "active" socket will be active to call the callback function, the other idle state socket will not, at this point, Epoll implemented a "pseudo" AIO, because this time the driving force in the OS kernel. In some benchmark, if all sockets are basically active-such as a high-speed LAN environment, Epoll is no more efficient than select/poll, on the contrary, there is a slight decrease in efficiency compared to using epoll_ctl too much. But once you use the idle connections to simulate a WAN environment, epoll is far more efficient than select/poll.

Use Mmap to speed up the kernel and user space message delivery.

For the third disadvantage the data is copied in the kernel space and the user space

The Epoll solution is in the Epoll_ctl function. Each time a new event is registered in the Epoll handle (specifying Epoll_ctl_add in Epoll_ctl), all FD is copied into the kernel instead of being duplicated at epoll_wait. Epoll guarantees that each FD will be copied only once throughout the process.

This actually involves the concrete implementation of the Epoll. Both Select,poll and epoll need the kernel to inform the user of the FD message, how to avoid unnecessary memory copy is very important, at this point, epoll through the kernel in the user space mmap the same piece of memory implementation. And if you want me to follow epoll from the 2.5 kernel, you will never forget to mmap this step manually.

Summarize

(1) The Select,poll implementation requires itself to constantly poll all FD collections until the device is ready, during which time the sleep and wake-up cycles may be repeated. While Epoll actually needs to call Epoll_wait to constantly poll the ready linked list, there may be multiple sleep and wake alternates, but when it is device ready, call the callback function, put the ready FD into the Ready list, and wake the process into sleep in epoll_wait. While both sleep and alternate, select and poll traverse the entire FD collection while "Awake", while Epoll is "awake" as long as it is OK to determine if the ready list is empty, which saves a lot of CPU time. This is the performance boost that the callback mechanism brings.

(2) Select,poll each call to the FD set from the user state to the kernel state copy once, and to the device to wait for the queue to hang once, and epoll as long as a copy, and the current to wait for the queue is hung only once (in Epoll_ At the start of wait, note that the wait queue here is not the device waiting queue, just a epoll internally defined wait queue. This can also save a lot of overhead.

Working mode of the epoll using Epoll

Happily, the epoll of the 2.6 kernel is much simpler than the/dev/epoll of its 2.5 development versions, so most of the cases, the powerful things are often simple. The only trouble is that Epoll has 2 ways of working: LT and ET.

The LT (level triggered) is the default mode of operation and supports both block and No-block sockets. In this practice, the kernel tells you whether a file descriptor is ready, and then you can perform IO operations on the Ready FD. If you do not do anything, the kernel will continue to notify you, so this mode of programming error is less likely. Traditional Select/poll are the representatives of this model.

ET (edge-triggered) is a high-speed mode of operation and supports only no-block sockets. In this mode, when the descriptor is never ready to be ready, the kernel tells you through Epoll. It then assumes that you know that the file descriptor is ready, and no more ready notifications will be sent for that file descriptor until you do something that causes the file descriptor to no longer be ready (for example, when you send, receive, or receive requests, or send less than a certain amount of data). A ewouldblock error). Note, however, that the kernel does not send more notifications (only once) if the FD is not being used as an IO operation (which causes it to become not ready again), but in the TCP protocol, the Acceleration utility of the ET mode still requires more benchmark acknowledgement.

Epoll only epoll_create,epoll_ctl,epoll_wait 3 system calls, please refer to http://www.xmailserver.org/linux-patches/for specific usage. Nio-improve.html, in http://www.kegel.com/rn/also has a complete example, we will see how to use the
Leader/follower mode thread pool implementation, and Epoll mates.

The efficiency of Epoll and the design of its data structure are inseparable, as this will be mentioned below.

Epoll Key Data Structures

The key data structure of the Epoll speed and its data structure is:

structepoll_event {    __uint32_t events;      // Epoll events    epoll_data_t data;      // User datavariable};typedefunion epoll_data {    void *ptr;   int fd;    __uint32_t u32;    __uint64_t u64;} epoll_data_t;

Visible Epoll_data is a union structure that can hold many types of information with the help of its application: FD, pointers, and so on. With it, the application can locate the target directly.

Using Epoll

Recall the Select model first, when an I/O event arrives, the select notifies the application that there is an event to go fast, and the application must poll all the FD collections, test each FD for events, and handle the events; The code looks like this:
The efficiency of Epoll and the design of its data structure are inseparable, as this will be mentioned below.

Recall the Select model first, when an I/O event arrives, the select notifies the application that there is an event to be processed quickly, and the application must poll all of the FD sets, test each FD for events, and handle the events;

The code looks like this:

int res = select(maxfd+1NULLNULL120);if0){    for(int0; i < MAX_CONNECTION; i++)    {        if(FD_ISSET(allConnection[i],&readfds))        {            handleEvent(allConnection[i]);        }    }}// if(res == 0) handle timeout, res < 0 handle error

Epoll not only tells the application that a i/0 event is coming, but also tells the application about the information that is populated by the application, so that the application can navigate directly to the event, without having to traverse the entire FD collection.

events20120);fori0i < res;i++){    handleEvent(events[n]);}

Start by create_epoll(int maxfds) creating a epoll handle, where Maxfds is the maximum number of handles that you epoll support. This function returns a new Epoll handle, and all subsequent operations are handled by this handle. When you're done, remember to close the created Epoll handle with close (). Then in your network main loop inside, each frame of the call epoll_wait(int epfd, epoll_event events, int max events, int timeout) to query all the network interface, to see which one can read, which can be written. The basic syntax is:

nfds = epoll_wait(kdpfd, events, maxevents, -1);

Where KDPFD is the handle created with Epoll_create, events is a epoll_event* pointer, and when epoll_wait this function succeeds, all read and write events are stored in epoll_events. Max_events is the number of socket handles currently required to be monitored. The last timeout is a timeout of epoll_wait, which is 0 when the return is immediately, and 1 is the time to wait until there is an event range, for any positive integer to represent such a long time, if there is no event, then the range. In general, if the network main loop is a separate thread, you can use-one, and so on, this can guarantee some efficiency, if it is the same thread as the main logic, you can guarantee the efficiency of the main loop with zero.
Since Epoll is so good compared to select, how does it work? Will it be very cumbersome ah ... Take a look at the following three functions, and know epoll is easy to use.

intepoll_create(intsize);

To generate a epoll dedicated file descriptor, you are actually applying for a kernel space to store the occurrence of the socket FD you want to focus on and what happened. Size is the maximum number of socket FD that you can focus on on this epoll FD, the size of the custom, as long as the memory is sufficient.

int epoll_ctl(intint fd, structepoll_event *event);

Controls the event on a epoll file descriptor: Register, modify, delete. Where the parameter EPFD is Epoll_create () creates a epoll dedicated file descriptor. Relative to the Fd_set and FD_CLR macros in the Select model.

int epoll_wait(int epfd,structepoll_event * events,int maxevents,int timeout);

Waits for an I/O event to occur and returns the number of events;

Functions similar to the SELECT function

Parameter description:

Parameters Description
Epfd Epoll-specific file descriptor generated by Epoll_create ()
Epoll_event Array for callback processing events
Maxevents Number of events that can be processed at a time
Timeout Timeout value waiting for I/O event to occur

Reference

I've read the best epoll model.

Epoll Model Detailed

Complete examples to understand how to use Epol

Epoll use of the detailed

Linux Socket Details (10)---IO multiplexing server in Epoll mode

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.