When developing high-performance network programs, Windows developers say that Iocp,linux developers will say Epoll. We all know that Epoll is an IO multiplexing technology that can handle millions of socket handles very efficiently, much more efficiently than previous select and poll. We use the epoll to feel very cool, really fast, then, why it can be high-speed processing so many concurrent connections?
Let's start with a brief review of how to use the 3 Epoll system calls in the C library package.
- int epoll_create (int size);
- int epoll_ctl (int epfd, int op, int fd, struct epoll_event *event);
- int epoll_wait (int epfd, struct epoll_event *events,int maxevents, int timeout);
Very clear to use, first call epoll_create to create a Epoll object. The parameter size is the maximum number of handles that the kernel guarantees to handle correctly, and the kernel does not guarantee the effect when the maximum number is greater.
Epoll_ctl can manipulate the epoll created above, for example, by adding a newly created socket to the Epoll to monitor it, or by moving a socket handle that epoll is monitoring out of epoll, no longer monitoring it, and so on.
Epoll_wait returns the user-state process at the time of invocation, in the given timeout period, when an event occurs in all of the monitored handles.
From the above call, you can see the superiority of epoll than Select/poll: Because the latter each call to pass all the sockets you want to monitor to the Select/poll system call, which means that the user-configured socket list to copy to the kernel state, It is very inefficient to copy a few hundred KB of memory to the kernel state each time with the handle of the million. When we call epoll_wait, we call Select/poll, but then we don't have to pass the socket handle to the kernel, because the kernel has got a list of handles to monitor in Epoll_ctl.
So, in fact, after you call Epoll_create, the kernel is ready to help you store the handle you want to monitor in the kernel state, and every time you call epoll_ctl, it just plugs in the new socket handle into the data structure of the kernel.
In the kernel, everything is file. Therefore, Epoll registers a file system with the kernel for storing the monitored sockets described above. When you call Epoll_create, a file node is created in this virtual epoll filesystem. Of course, this file is not an ordinary document, it only serves epoll.
Epoll is initialized by the kernel (operating system boot), and will open up epoll own kernel high-speed cache area for each socket we want to monitor, these sockets will be in the form of red-black tree in the kernel cache, to support the quick Find, insert, delete. This kernel high-speed cache area, is to establish a continuous physical memory page, and then build on the slab layer, simply, is physically allocated the size of the memory object you want, each use is the use of idle allocated objects.
- static int __init eventpoll_init (void)
- {
- ... ...
- / * Allocates slab cache used to allocate "struct Epitem" items * /
- Epi_cache = Kmem_cache_create ("Eventpoll_epi", sizeof (struct epitem),
- 0, slab_hwcache_align| Epi_slab_debug| Slab_panic,
- NULL, NULL);
- / * Allocates slab cache used to allocate "struct eppoll_entry" * /
- Pwq_cache = Kmem_cache_create ("Eventpoll_pwq",
- sizeof (struct eppoll_entry), 0,
- Epi_slab_debug| Slab_panic, NULL, NULL);
- ... ...
The efficiency of epoll is that when we call epoll_ctl into a million handle, epoll_wait can still quickly return and effectively handle the event to our users. This is because when we call epoll_create, the kernel in addition to help us in the Epoll file system to build a Files node, in the kernel cache built a red black tree used to store the future Epoll_ctl came out of the socket, but also set up a list linked list, For storing ready-to-use events, when epoll_wait is called, simply observe that there is no data in the list link. There is data on the return, no data on sleep, wait until timeout time to even if the linked list no data also returned. So, epoll_wait is very efficient.
And, usually even if we want to monitor millions of handles, most of the time we only return a small number of ready-to-handle, so, epoll_wait only need to copy a small number of handles from the kernel state to the user state, how can not be efficient?!
So, how is this ready to be maintained for list lists? When we execute epoll_ctl, in addition to placing the socket on the red-black tree of the file object in the Epoll filesystem, it also registers a callback function with the kernel interrupt handler, telling the kernel that if the handle is interrupted, it is placed in the Ready list link. So, when there is data on a socket, the kernel inserts the socket into the ready-made list after the data is copied to the kernel on the NIC.
So, a red black tree, a ready-to-handle chain list, a small number of kernel cache, help us solve the problem of socket processing under large concurrency. When executing epoll_create, a red-black tree and a ready linked list are created, and when the epoll_ctl is executed, if the socket handle is incremented, the check exists in the red-black tree, the presence returns immediately, does not exist, is added to the trunk, and the callback function is registered to the kernel. Used when breaking events to temporarily insert data into a ready-to-prepare list. Immediately return the data in the ready-to-be list when executing epoll_wait.
Finally, take a look at Epoll's unique two modes LT and ET. Both the LT and ET modes apply to the process described above. The difference is that in the LT mode, whenever an event on a handle is not processed at a time, the handle is returned at a later call to Epoll_wait, and the ET pattern is returned only for the first time.
How did this happen? When there is an event on a socket handle, the kernel inserts the handle into the list of Ready lists listed above, and when we call epoll_wait, the ready socket is copied to the user-state memory, then the ready list is emptied, and finally, the Epoll_ Wait to do a thing, is to check these sockets, if it is not the ET mode (is the handle of the LT mode), and these sockets do have unhandled events, and then put the handle back to just emptied the ready-made list. Therefore, the handle of the non-ET, as long as there is an event above it, Epoll_wait will return every time. The handle of the ET pattern, unless there is a new interrupt, is not returned from epoll_wait, even if the event on the socket has not been processed.
How efficient Epool