In detail, select poll Epoll.

Source: Internet
Author: User
Tags epoll

(The following comes from the web and its own summary, thanks again for the insights provided by the great gods on the Web)


Before exploring Select poll Epoll We first need to know what is called multiplexing:

Come down and explore why multiplexing is used:

first, let's look at the complete process of a client request server. First, the request comes in to establish a connection, then receive the data, receive the data, and then send the data.

Specific to the bottom of the system, is read and write events, and when the read and write events are not ready, is bound to not operate, if not the non-blocking way to call, it will have to block the call, the event is not ready, it can only wait, and so the event is ready, you continue it. The blocking call goes into the kernel and the CPU makes it available to other processes, and you might say that the number of processes that are created when there are a lot of read and write events, the context switch of the process consumes too much CPU resources. Some people will say that with threads, the context switch of a thread also consumes too much resources, and it introduces problems of synchronization and mutual exclusion between threads, because the same piece of memory is seen between threads.


So I'll think about a process to see a lot of Io time, for example, everyone is fishing every fish hooked is more than an event occurs, then 100 events you can let 100 people in there with a fishing rod to fish, you are responsible for the fish collection. At this time if there is no fish to bait, the 100 people in that block wait, you yourself in order to collect the fish is also idle. Here you can be compared to the CPU, the general individual can be likened to a number of processes, at this time if not all the fish are hooked, you are very idle other people also in that holding the fishing rod idle waiting, if there are multiple fish hooked, more people will be like you report, at this time the order of reporting problems is the formation of chaos. At this point we can make improvements, such as a special to find a person holding a lot of fishing rods, when a fishing rod on the hook and then pull up the fishing rod, so as to save manpower, but also solve the problem.


Come down and tell a real story:

Suppose you are an air tube at an airport, you need to manage all the routes to your airport, including inbound, outbound, some flights need to be on the tarmac, some flights need to go to the boarding gate to pick up passengers.

What would you do?
The simplest way is that you go to recruit a large number of air-control staff, and then each eye on an aircraft, from the incoming, pick-up, row, port, route monitoring, until the handover to the next airport, full monitoring.
So here's the question:
Soon you will find the empty tube tower inside a bunch of air-traffic control, a little busy, the new air operator has not been squeezed in.
Air control between the need for coordination, the room is 1, 2 people when it is OK, the number of people after, basically become a vegetable market.
Air-control staff often need to update some common things, such as take-off display, such as the next one hours after the departure schedule, finally you will be surprised to find that everyone's time finally spent on the looting of these resources. (Resource sharing between threads)

The reality of our air tube at the same time tube dozens of aircraft commonplace things, how they do?
They use this thing.


This thing is called flight progress strip. Each block represents a flight, different slots represent different states, and then an air operator can manage a set of such blocks (a set of flights), and his job is to put the corresponding blocks into different trough when the flight information is updated.


This thing has not been eliminated now oh, just into the electronic.
Do you feel that a lot of efficiency is high, an empty tube tower can be dispatched to the route can be the previous method several times to dozens of times.
If you take each route as a sock (I/O stream), the empty tube is treated as your server sock management code.
The first approach is the most traditional multi-process concurrency model (each incoming new I/O stream is assigned a new process management. )
The second approach is I/O multiplexing (a single thread that records the state of each I/O stream (sock) to manage multiple I/O streams at the same time. )


In fact, the "I/O multiplexing" is probably the reason why this concept is so difficult to understand in Chinese. The so-called I/O multiplexing is actually called I/O multiplexing in English. If you search for multiplexing, it will basically show this graph:


So most people directly think of "a network cable, multiple sock reuse" this concept, including the above several answers, in fact, no matter you use multi-process or I/O multiplexing, network cable is only a good felling. Multiple sock multiplexing A network cable This function is implemented in the kernel + driver layer.

Important thing to say again: I/O multiplexing this multiplexing refers to the fact that a single thread tracks each sock (I/O stream) state (corresponding to the fight progress strip slot inside the empty tube tower) to manage multiple I /o stream. The reason for inventing it is to increase the throughput capacity of the server as much as possible.

In the same process, through the switch mode, to transmit multiple I/O streams at the same time, (the person who learned EE can now stand out of the righteousness is said this is called "Time Division Multiplexing").


With so many stories coming down to our focus, select Poll Epoll.

Select, poll, Epoll are all specific implementations of I/O multiplexing, the reason why there are these three ghosts exist, in fact, they appear to have a sequence.

When the concept of I/O multiplexing is raised, select is the first implementation (around 1983 in BSD).

When select is implemented, many problems are soon exposed.
Select modifies the array of arguments passed in, which is very unfriendly for a function that needs to be called many times.
Select if any one sock (I/O stream) has data, select will only return, but will not tell you that there is data on the sock, so you can only find one by yourself, 10 sock may be okay, If tens of thousands of of sock every time (in fact, select can not support tens of thousands of), this unnecessary overhead is quite the heroic of the Haitian rendez-vous.
Select can only monitor 1024 links, this is not related to the grass, Oh, Linux definition in the header file, see Fd_setsize.
Select is not thread safe, if you add a sock to the select, and then suddenly another thread discovers that the sock does not need to be retracted. Sorry, this select does not support, if you are insane to turn off this sock, select the standard behavior is. Uh.. Unpredictable, but this is written in the documentation of the OH.
"If a file descriptor being monitored by select () are closed in another thread, and the result is unspecified"
PA Not domineering


So after 14 (1997) A bunch of people realized poll, poll fixed a lot of problems with SELECT, such as
Poll removed the limit of 1024 links, so how many links it, master you happy good.
Poll from the design, no longer modify the incoming array, but this depends on your platform, so walk the lake, or careful for the wonderful.
Actually drag 14 so long is not the problem of efficiency, but the hardware of that era is too weak, a server processing more than 1000 links is simply God's existence, select for a long time to meet the needs.

But poll is still not thread-safe, which means that you can only process a set of I/O flows within a thread, no matter how powerful the server is. You can certainly do that much, but then you have a lot of problems with the process.

So 5 years later, in 2002, the Great god Davide Libenzi realized the Epoll.

Epoll can be said to be the latest implementation of I/O multiplexing, epoll fixed most of the poll and select issues, such as:
Epoll is now thread-safe.
Epoll now not only tell you the data in the sock group, but also tell you which sock have the data, you don't have to find it yourself.

but Epoll has a fatal flaw. Only Linux support. (This drawback is still a disadvantage, master, you Happy good) such as BSD above the corresponding implementation is kqueue.


PS: Above all these comparative analysis, are built under large concurrency, if you have too few concurrent numbers, with which, in fact, there is no difference.

For Linux network interface, you only have one network card, but need to process n links at the same time, here need multiplexing;

Up to the software level, that is, the main concern of several system calls, that is, IO (network socket included) when one to many is required to provide a multiplexing mechanism.

Simply say your own understanding.
Io is divided into disk IO and network IO, which is said to be network IO. We know that transferring data between computers is a stream of transmission. A computer network IO will only have one.

This is a single process.
In the most basic C/s demo, SEND/RECV is to send and receive data in an IO channel, this is the basic network IO, but this operation can not "fill" io, that is, most of the IO resources you do not use, there is only one IO operation, of course you can open multi-process or multi-threading, The cost is conceivable

In this case, IO multiplexing has occurred, the translation of its own words, multiplexing network IO so that multiple IO operations can be performed in network IO.
Linux network IO uses socket sockets to communicate, the normal IO model can only listen to a socket, and IO multiplexing can listen to multiple sockets simultaneously

Io multiplexing avoids blocking on Io (in fact select poll Epoll are all blocking only can set timeouts), originally for multi-process or multi-threaded to receive multiple connected messages into a single process or single-threaded save multiple sockets state after polling processing (Epoll has been improved, later said )


Implement IO multiplexing needs function to support, is Linux under the Select/poll,epoll and win under IOCP and BSD Kqueue


Select
The following is the function interface for select:
int select (int n, fd_set *readfds, Fd_set *writefds, Fd_set *exceptfds, struct timeval *timeout);

The Select function monitors file descriptors in 3 categories, Writefds, Readfds, and Exceptfds, respectively. Internal use of three bitmap implementations

After the call, the Select function blocks until a description is ready (with data readable, writable, or except), or timed out (timeout Specifies the wait time, and if the return is set to null immediately), the function returns.

When the Select function returns, you can find the ready descriptor by traversing Fdset.
Select is currently supported on almost all platforms, and its good cross-platform support is one of its advantages. A disadvantage of select is that the maximum number of file descriptors that a single process can monitor is 1024 on Linux, which can be improved by modifying the macro definition or even recompiling the kernel, but this also results in a decrease in efficiency.



Poll
int poll (struct POLLFD *fds, unsigned int nfds, int timeout);
Unlike select, which uses three bitmaps to represent three Fdset, poll is implemented using a POLLFD pointer.
struct POLLFD {
int FD; /* File Descriptor */
Short events; /* Requested Events to watch */
Short revents; /* Returned events witnessed */

};
The POLLFD structure contains the event to be monitored and the event that occurred, no longer using the Select "parameter-value" delivery method. At the same time, POLLFD does not have the maximum number of limits (but the performance will also decrease if the number is too large). Internal use of a structure of the list of the implementation of the

As with the Select function, poll returns, you need to poll the POLLFD to get the ready descriptor.
From the above, select and poll need to traverse the file descriptor to get a ready socket after returning. In fact, a large number of clients connected at the same time may only be in a very small state of readiness at a time, so their efficiency will decrease linearly as the number of monitored descriptors increases.


Epoll:
The Epoll interface is as follows:
int epoll_create (int size);
int epoll_ctl (int epfd, int op, int fd, struct epoll_event *event);
typedef Union EPOLL_DATA {
void *ptr;
int FD;
__uint32_t u32;
__uint64_t U64;
} epoll_data_t;

struct Epoll_event {
__uint32_t events; /* Epoll Events */
epoll_data_t data; /* USER Data variable */
};

int epoll_wait (int epfd, struct epoll_event * events, int maxevents, int epoll:
The Epoll interface is as follows:
int epoll_create (int size);
int epoll_ctl (int epfd, int op, int fd, struct epoll_event *event);
typedef Union EPOLL_DATA {
void *ptr;
int FD;
__uint32_t u32;
__uint64_t U64;
} epoll_data_t;

struct Epoll_event {
__uint32_t events; /* Epoll Events */
epoll_data_t data; /* USER Data variable */
};

int epoll_wait (int epfd, struct epoll_event * events, int maxevents, int timeout);

The main epoll_create,epoll_ctl and epoll_wait are three functions. The Epoll_create function creates a Epoll file descriptor, and the parameter size does not limit the maximum number of descriptors that epoll can listen to, but is a recommendation for the kernel to initially allocate internal data structures. The return is a Epoll descriptor. -1 indicates that the creation failed. The Epoll_ctl controls the OP operation on the specified descriptor FD, which is the listener event associated with the FD. There are three OP operations: Add Epoll_ctl_add, delete Epoll_ctl_del, modify Epoll_ctl_mod. Add, delete, and modify the listener events for FD, respectively. Epoll_wait waits for an IO event on EPFD and returns up to Maxevents events.


The main epoll_create,epoll_ctl and epoll_wait are three functions.

The Epoll_create function creates the Epoll file descriptor, (after linux2.6) The parameter size does not limit the maximum number of descriptors that can be monitored by epoll, but is a recommendation for the kernel to initially allocate internal data structures. The return is a Epoll descriptor. -1 indicates that the creation failed.

The Epoll_ctl controls the OP operation on the specified descriptor FD, which is the listener event associated with the FD. There are three OP operations: Add Epoll_ctl_add, delete Epoll_ctl_del, modify Epoll_ctl_mod. Add, delete, and modify the listener events for FD, respectively.

Epoll_wait waits for an IO event on the EPFD, returning up to maxevents events, timeout time-out.


Why Epoll is efficient (compared to select)

1, only from the above call way can see Epoll than select/poll One advantage: Select/poll each call to pass all the FD to be monitored to the select/poll system call (this means that each call to the FD list from the user state copy to the kernel state, This can cause inefficiencies when the number of FD is large. Instead of passing the FD list to the kernel each time the epoll_wait is called (the function is equivalent to calling Select/poll), the FD is told to the kernel (Epoll_ctl does not need to copy all of the FD every time. Only incremental operations are required.)

Therefore, after calling Epoll_create, the kernel has started to prepare the data structure in the kernel state to store the FD to be monitored. Each time the epoll_ctl is simply maintained for this data structure.

2. In addition, the kernel uses the slab mechanism, which provides a fast data structure for Epoll:
In the kernel, everything is file. Therefore, Epoll registers a file system with the kernel to store the above monitored fd. When you call Epoll_create, a file node is created in this virtual epoll filesystem.

Of course, this file is not an ordinary document, it only serves epoll. Epoll is initialized by the kernel (operating system boot), and will open up epoll own kernel high-speed cache area, for each of the FD we want to monitor, these FD will be in the form of red-black tree in the kernel cache, to support the fast find, insert, delete. This kernel high-speed cache area, is to establish a continuous physical memory page, and then build on the slab layer, simply, is physically allocated the size of the memory object you want, each use is the use of idle allocated objects.

3, Epoll's third advantage is: when we call Epoll_ctl into the million FD, Epoll_wait can still quickly return, and effectively the occurrence of FD to our users.

This is because when we call epoll_create, the kernel in addition to help us in the Epoll file system to build a Files node, in the kernel cache built a red black tree to store the later Epoll_ctl came out of the FD, but also set up a list linked list, For storing ready-to-use events, when epoll_wait is called, simply observe that there is no data in the list link. There is data on the return, no data on sleep, wait until timeout time to even if the linked list no data also returned.

So, epoll_wait is very efficient. And, usually even if we want to monitor millions of FD, most of the time we only return a very small amount of ready fd, so, epoll_wait only need to copy a small amount of FD to the user state from the kernel state.

So, how is this ready to be maintained for list lists? When we execute the EPOLL_CTL, in addition to the FD in the Epoll file system in the corresponding red and black tree, but also to the kernel interrupt handler register a callback function, tell the kernel, if the FD is interrupted, put it in the Ready List link table. Therefore, when a FD (such as a socket) has data on it, the kernel inserts the FD (socket) into the Ready list link after the data is copied to the kernel on the device (for example, the NIC).



So, a red and black tree, a ready-made FD linked list, a small amount of kernel cache, helps us solve the problem of FD (socket) processing under large concurrency.

1. When executing epoll_create, the red-black tree and the ready list are created.

2. When executing the EPOLL_CTL, if you increase the FD (socket), check for presence in the red-black tree, return immediately, do not exist, add to the red-black tree, and then register the callback function with the kernel to insert data into the ready-to-use list list temporarily when the interrupt event occurs.
3. Immediately return the data in the ready-to-be list when executing epoll_wait.

There are two models of the Epoll event:
Edge triggered (ET)
Level triggered (LT)

The LT (level triggered) is the default mode of operation and supports both block and No-block sockets. In this practice, the kernel tells you whether a file descriptor is ready, and then you can perform IO operations on the Ready FD. If you do not do anything, the kernel will continue to notify you, so this mode of programming error is less likely. Traditional Select/poll are the representatives of this model.

ET (edge-triggered) is a high-speed mode of operation and supports only no-block sockets. In this mode, when the descriptor is never ready to be ready, the kernel tells you through Epoll. It then assumes that you know that the file descriptor is ready and no more ready notifications are sent for that file descriptor until you do something that causes the file descriptor to no longer be ready (for example, you are sending, receiving, or receiving requests, Or send received less than a certain amount of data caused by a ewouldblock error). Note, however, that the kernel does not send more notifications (only once) if the FD is not being used as an IO operation (causing it to become not ready again).



In detail, select poll Epoll.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.