How to efficiently handle millions of connections under Linux Epoll

Source: Internet
Author: User
Tags epoll

When developing high-performance network programs. The Windows developers say that Iocp,linux developers will say Epoll. It is clear that Epoll is an IO multiplexing technology that can efficiently handle millions of socket handles, and is much more efficient than the previous select and poll.

We used to epoll to feel pretty cool, really fast, then. Why on earth can it handle so many concurrent connections quickly?


Let's recall how to use the 3 Epoll system calls in C library package.

[CPP]View Plaincopy
    1. int  epoll_create (int size);
    2. int epoll_ctl (int EPFD, int op, int FD,   struct epoll_event *event);
    3. int epoll_wait (int EPFD, struct epoll_event *events,int   maxevents, int timeout);

Very clear to use, first call epoll_create to create a Epoll object. The parameter size is the maximum number of handles that the kernel guarantees can be handled correctly. The kernel does not guarantee the effect when more than this maximum number.


Epoll_ctl is able to operate the Epoll created above, for example, by adding the newly created socket to Epoll for monitoring. Or move a socket handle that epoll is monitoring out of epoll. No longer monitoring it and so on.

Epoll_wait returns the user-state process at the time of invocation, in the given timeout period, when an event occurs in all of the monitored handles.


From the above call, you can see the superiority of epoll than Select/poll: Since the latter each call to pass all the sockets you want to monitor to the Select/poll system call, which means that the user-configured socket list to copy to the kernel state, It is inefficient to assume that the handle of the meter will cause a few hundred KB of memory to be copied to the kernel state each time.

When we call epoll_wait, we call Select/poll, but then we don't have to pass the socket handle to the kernel, because the kernel has got a list of handles to monitor in Epoll_ctl.


So. In fact, after you call Epoll_create, the kernel is ready to help you store the handle you want to monitor in the kernel state. Each call to Epoll_ctl is simply a new socket handle being plugged into the data structure of the kernel.


In the kernel, everything is file. Therefore, epoll a file system to the kernel to store the monitored sockets described above.

When you call Epoll_create, a file node is created in this virtual epoll filesystem. Of course, this file is not an ordinary document, it only serves epoll.


Epoll is initialized by the kernel (operating system boot). At the same time, Epoll's own kernel fast cache area will be created to accommodate each socket we want to monitor, which is stored in the kernel cache as a red-black tree. To support Quick Find, insert, delete.

This kernel fast cache area. is to create contiguous physical memory pages. Then build the slab layer on top, simply say. is to physically allocate the size of the memory object you want, each time you use the spare allocated objects.

[CPP]View Plaincopy
  1. Static int __init eventpoll_init (void)
  2. {
  3. ... ...
  4. / * Allocates slab cache used to allocate "struct Epitem" items * /   
  5. Epi_cache = Kmem_cache_create ("Eventpoll_epi", sizeof(struct Epitem),
  6. 0, slab_hwcache_align| Epi_slab_debug| Slab_panic,
  7. NULL, NULL);
  8. / * Allocates slab cache used to allocate "struct eppoll_entry" * /   
  9. Pwq_cache = Kmem_cache_create ("Eventpoll_pwq",
  10. sizeof (struct eppoll_entry), 0,
  11. Epi_slab_debug| Slab_panic, NULL, NULL);
  12. ... ...


The efficiency of epoll is that when we call epoll_ctl into a million handle, epoll_wait can still quickly return and effectively handle the event to our users.

This is because when we call epoll_create, the kernel in addition to help us in the Epoll file system to build a Files node, in the kernel cache built a red black tree used to store the future Epoll_ctl came out of the socket, but also set up a list linked list, For storing ready-to-use events, when epoll_wait is called, only observe that there is no data in the list link. return with data. Sleep without data, wait until timeout time is back even if the linked list does not have data.

So, epoll_wait is very efficient.


And, usually, even if we want to monitor millions of handles. Most of the time, only a very small number of ready handles are returned, so the epoll_wait only needs to copy a small number of handles from the kernel state to the user state, how can it not be efficient?!


So, how is this ready to be maintained for list lists? When we run Epoll_ctl, in addition to placing the socket on the corresponding red-black tree of the file object in the Epoll filesystem, a callback function is also given to the kernel interrupt handler to tell the kernel. Assuming that the handle is interrupted, put it in the Ready list link. So, when there is data on a socket, the kernel inserts the socket into the ready-made list after the data is copied to the kernel on the NIC.


So. A red black tree, a ready-to-handle chain list, a small number of kernel cache, help us overcome the big concurrency socket processing problem. When you run Epoll_create, you create a red-black tree and a ready list, and when you run Epoll_ctl, if you add a socket handle, check for presence in the red-black tree and return immediately. Does not exist add to the trunk and then register the callback function with the kernel, which is used for the break event to temporarily insert data into the ready-to-ready list. When you run epoll_wait, you immediately return to the data in the ready-to-be list.


Finally, take a look at Epoll's unique two modes LT and ET. Both the LT and ET modes. are applicable to the above mentioned processes.

The difference is that in the LT mode, only the events on a handle are not processed at once. This handle is returned the next time the epoll_wait is called, and the ET pattern is returned only for the first time.


How did this happen? When there is an event on a socket handle. The kernel inserts the handle into the list of Ready lists listed above. Then we call epoll_wait. The ready socket is copied to the user-State memory. Then clear the Ready list of lists, and finally. Epoll_wait did a thing. is to check these sockets, assuming that it is not the ET pattern (the handle to the LT pattern), and that the socket does have an unhandled event, and then put the handle back into the ready-made list that was just emptied. So, the handle of the non-ET, just to have the event above it. Epoll_wait will return every time. And the handle of the ET pattern. Unless there is a new interruption, even if the event on the socket is not finished, it will not be returned from epoll_wait again.


How to efficiently handle millions of connections under Linux Epoll

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.