How to efficiently handle millions of connections under Linux Epoll

Last Update:2017-05-23 Source: Internet

Author: User

Tags epoll

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When developing high-performance network programs. The Windows developers say that Iocp,linux developers will say Epoll. It is clear that Epoll is an IO multiplexing technology that can efficiently handle millions of socket handles, and is much more efficient than the previous select and poll.

We used to epoll to feel pretty cool, really fast, then. Why on earth can it handle so many concurrent connections quickly?

Let's recall how to use the 3 Epoll system calls in C library package.

[CPP]View Plaincopy

int epoll_create (int size);
int epoll_ctl (int EPFD, int op, int FD, struct epoll_event *event);
int epoll_wait (int EPFD, struct epoll_event *events,int maxevents, int timeout);

Very clear to use, first call epoll_create to create a Epoll object. The parameter size is the maximum number of handles that the kernel guarantees can be handled correctly. The kernel does not guarantee the effect when more than this maximum number.

Epoll_ctl is able to operate the Epoll created above, for example, by adding the newly created socket to Epoll for monitoring. Or move a socket handle that epoll is monitoring out of epoll. No longer monitoring it and so on.

Epoll_wait returns the user-state process at the time of invocation, in the given timeout period, when an event occurs in all of the monitored handles.

From the above call, you can see the superiority of epoll than Select/poll: Since the latter each call to pass all the sockets you want to monitor to the Select/poll system call, which means that the user-configured socket list to copy to the kernel state, It is inefficient to assume that the handle of the meter will cause a few hundred KB of memory to be copied to the kernel state each time.

When we call epoll_wait, we call Select/poll, but then we don't have to pass the socket handle to the kernel, because the kernel has got a list of handles to monitor in Epoll_ctl.

So. In fact, after you call Epoll_create, the kernel is ready to help you store the handle you want to monitor in the kernel state. Each call to Epoll_ctl is simply a new socket handle being plugged into the data structure of the kernel.

In the kernel, everything is file. Therefore, epoll a file system to the kernel to store the monitored sockets described above.

When you call Epoll_create, a file node is created in this virtual epoll filesystem. Of course, this file is not an ordinary document, it only serves epoll.

Epoll is initialized by the kernel (operating system boot). At the same time, Epoll's own kernel fast cache area will be created to accommodate each socket we want to monitor, which is stored in the kernel cache as a red-black tree. To support Quick Find, insert, delete.

This kernel fast cache area. is to create contiguous physical memory pages. Then build the slab layer on top, simply say. is to physically allocate the size of the memory object you want, each time you use the spare allocated objects.

[CPP]View Plaincopy

Static int __init eventpoll_init (void)
{
... ...
/ * Allocates slab cache used to allocate "struct Epitem" items * /
Epi_cache = Kmem_cache_create ("Eventpoll_epi", sizeof(struct Epitem),
0, slab_hwcache_align| Epi_slab_debug| Slab_panic,
NULL, NULL);
/ * Allocates slab cache used to allocate "struct eppoll_entry" * /
Pwq_cache = Kmem_cache_create ("Eventpoll_pwq",
sizeof (struct eppoll_entry), 0,
Epi_slab_debug| Slab_panic, NULL, NULL);
... ...

The efficiency of epoll is that when we call epoll_ctl into a million handle, epoll_wait can still quickly return and effectively handle the event to our users.

This is because when we call epoll_create, the kernel in addition to help us in the Epoll file system to build a Files node, in the kernel cache built a red black tree used to store the future Epoll_ctl came out of the socket, but also set up a list linked list, For storing ready-to-use events, when epoll_wait is called, only observe that there is no data in the list link. return with data. Sleep without data, wait until timeout time is back even if the linked list does not have data.

So, epoll_wait is very efficient.

And, usually, even if we want to monitor millions of handles. Most of the time, only a very small number of ready handles are returned, so the epoll_wait only needs to copy a small number of handles from the kernel state to the user state, how can it not be efficient?!

So, how is this ready to be maintained for list lists? When we run Epoll_ctl, in addition to placing the socket on the corresponding red-black tree of the file object in the Epoll filesystem, a callback function is also given to the kernel interrupt handler to tell the kernel. Assuming that the handle is interrupted, put it in the Ready list link. So, when there is data on a socket, the kernel inserts the socket into the ready-made list after the data is copied to the kernel on the NIC.

So. A red black tree, a ready-to-handle chain list, a small number of kernel cache, help us overcome the big concurrency socket processing problem. When you run Epoll_create, you create a red-black tree and a ready list, and when you run Epoll_ctl, if you add a socket handle, check for presence in the red-black tree and return immediately. Does not exist add to the trunk and then register the callback function with the kernel, which is used for the break event to temporarily insert data into the ready-to-ready list. When you run epoll_wait, you immediately return to the data in the ready-to-be list.

Finally, take a look at Epoll's unique two modes LT and ET. Both the LT and ET modes. are applicable to the above mentioned processes.

The difference is that in the LT mode, only the events on a handle are not processed at once. This handle is returned the next time the epoll_wait is called, and the ET pattern is returned only for the first time.

How did this happen? When there is an event on a socket handle. The kernel inserts the handle into the list of Ready lists listed above. Then we call epoll_wait. The ready socket is copied to the user-State memory. Then clear the Ready list of lists, and finally. Epoll_wait did a thing. is to check these sockets, assuming that it is not the ET pattern (the handle to the LT pattern), and that the socket does have an unhandled event, and then put the handle back into the ready-made list that was just emptied. So, the handle of the non-ET, just to have the event above it. Epoll_wait will return every time. And the handle of the ET pattern. Unless there is a new interruption, even if the event on the socket is not finished, it will not be returned from epoll_wait again.

How to efficiently handle millions of connections under Linux Epoll

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More