[Switch] kqueue and epoll mechanisms

Last Update:2018-10-25 Source: Internet

Author: User

Tags epoll set socket

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, we will introduce blocking and non-blocking:
What is blocking? For example, you are waiting for the express delivery at some time, but you do not know when the express delivery will come, and you have nothing to do (or the next thing must be done by the express delivery ); then you can go to bed, because you know that the express delivery will definitely give you a call (assuming you will be able to wake you up ).

Non-blocking busy polling. Next, let's wait for the example above. If you use the round-robin method, you need to know the courier's mobile phone number, and then call him every minute: "Have you arrived yet ?"

Obviously, most people do not use the second method, which is not only brainless, but also a waste of phone calls and takes a lot of time for couriers.
Most programs will not use the second method, because the first method is economical and simple. The economy refers to the consumption of a small amount of CPU time. If the thread is sleep, the system's scheduling queue will be dropped, at the moment, we will not split up valuable CPU time slices.

To understand how blocking is implemented, we will discuss the buffer and kernel buffer, and finally explain the I/O events clearly. The buffer zone is introduced to reduce frequent I/O operations and cause frequent system calls. When you operate a stream, more operations are performed in the buffer zone, this is relative to the user space. A buffer is also required for the kernel.
Assume that there is an MPS queue. process a is the writer of the MPs queue, and process B is the reader of the MPs queue.

Assume that the kernel buffer is empty at the beginning, and B is blocked as the reader. Then a writes data to the pipeline. At this time, the kernel buffer changes from the empty state to the non-empty state, and the kernel will generate an event to tell B to wake up, this event is called "the buffer zone is not empty ".
However, after the "non-empty buffer" event notifies B, B has not read the data, and the kernel promises not to discard the data written into the pipeline, the data written by a will be stuck in the kernel buffer. If the kernel buffer is full, B will not start to read the data, and the kernel buffer will be filled up, at this time, an I/O event will be generated, telling process a that you should wait (blocking). We define this event as "the buffer is full ".
Assuming that B finally began to read data, and the kernel buffer is empty, then the kernel will tell a that there is space in the kernel buffer, and you can wake up from sleep, data Writing continues. We call this event "the buffer is not full"
Maybe event Y1 has notified a, but a has no data written, and B continues to read the data, knowing that the kernel buffer is empty. At this time, the kernel tells B that you need to block it !, We set this time as "the buffer zone is empty ".

These four situations cover four I/O events, the buffer is full, the buffer is empty, the buffer is not empty, and the buffer is not full (say the kernel buffer ). These four I/O events are fundamental to blocking synchronization.

In blocking I/O mode, a thread can only process the I/O events of one stream. If you want to process multiple streams at the same time, either multi-process (fork) or multi-thread (pthread_create), unfortunately these two methods are not efficient.
So let's consider the I/O method of non-blocking busy polling. We found that we can process multiple streams at the same time:

while true {for i in stream[]; {if i has dataread until unavailable}}

We just need to keep asking all the streams from start to end and start from the beginning. In this way, you can process multiple streams, but this is obviously not good, because if all the streams have no data, it will only waste the CPU. In blocking mode, the kernel blocks or wakes up the processing of I/O events, but in non-blocking mode, the I/O events are handed over to other objects. To avoid CPU idling, you can introduce a proxy (select ). This proxy can observe the I/O events of many streams at the same time. When idle, it will block the current thread. When one or more streams have I/O events, just woke up from the blocking state, so our program will round-robin all the streams.

while true {select(streams[])for i in streams[] {if i has dataread until unavailable}}

If no I/O event is generated, our program will block the SELECT statement. But there is still a problem. We only know from select that an I/O event has occurred, but we do not know the several streams (there may be one or more, or even all), we can only poll all streams without difference, find the stream that can read or write data, and operate on them.
However, when we use select, we have no difference in the round robin complexity of O (n). The more streams we process at the same time, the longer the polling time for each round. Select/poll gets the ready state through the round-robin method. After you call select/poll, it will block until a ready file descriptor is available, or timeout or interrupted. The returned value is the number of ready file descriptors. You need to traverse the bit domain or array of the file descriptor passed in as a parameter to obtain the file descriptor. Therefore, epoll is introduced:
Epoll can be understood as event poll. Unlike busy polling and non-differential polling, epoll will notify us of what kind of I/O event happens to which stream. In this case, the operations on these streams are meaningful. (Complexity is reduced to O (k). k is the number of streams that generate I/O events. It is also considered that O (1 )) epoll obtains the ready state by means of background interruption. It calls epoll_create to create an instance, calls epoll_ctl to add or delete the monitored file descriptor, and CALLS epoll_wait to block it until there is a ready file descriptor, the epoll_event parameter is used to return the file descriptor and event in the ready state.

Epoll_create creates an epoll object. Generally, epollfd = epoll_create ()
Epoll_ctl (the combination of epoll_add/epoll_del) to add/delete an event of a stream to the epoll object
For example
Epoll_ctl (epollfd, epoll_ctl_add, socket, epollin); // epoll_wait returns if there is data in the buffer zone.
Epoll_ctl (epollfd, epoll_ctl_del, socket, epollout); // epoll_wait returned when the buffer zone can be written
Epoll_wait (epollfd,...) waits until the registration event occurs.

(Note: When the read/write of a non-blocking stream is full or the buffer zone is empty, write/read will return-1 and set errno = eagain. Epoll only cares about non-full buffer and non-empty buffer events ).
Code in the epoll mode is as follows:

while true {active_stream[] = epoll_wait(epollfd)for i in active_stream[] {read or write till unavailable}}

The epoll principle is:
You hand over the file to be monitored for read/write to the kernel (epoll_add)
Set the events you care about (epoll_ctl), such as read events
Then wait (epoll_wait). At this time, if no file has an event you are concerned about, sleep until there is an event that is awakened.
Then return the events to achieve concurrency, and also need to cooperate with non-blocking read/write. In this way, you can collect tokens to put the file (socket), and then read and write the token to put the file (not because a file is slow and blocked), so as to achieve concurrency.

The advantage of epoll is that the OS that receives data is responsible for notifying you that data can be operated, because the OS knows when data is available.
The advantage of epoll is that you can do other things at will. When there is a courier, he will call you to get it, you can leave it empty.
Unlike blocking, you don't need to keep looking at the delivery by the window, and you don't need to keep calling the select to ask if the delivery is not coming. Especially when there are a lot of express deliveries, select needs to ask you if you don't have your express delivery. Sometimes, you also need to ask one by one if a parcel has arrived; epoll will tell you which package you sent.

Kqueue is very similar to epoll. It was originally a high-performance event notification interface developed by Jonathan lemon on FreeBSD in 2000. After a batch of socket descriptors are registered to kqueue, when the descriptor status changes, kqueue will notify the application of which descriptors are readable, writable, or wrong at a time.

The kqueue interfaces include the kqueue () and kevent () system calls and the struct kevent structure:

Kqueue () generates a kernel event queue and returns the file descriptor of the queue. Other APIs use this descriptor to operate on this kqueue.
Kevent () provides registration/anti-registration events to the kernel and return readiness events or error events.
Struct kevent is the most basic event structure of the kevent () operation.

Struct kevent {uintptr_t ident;/* event ID */short filter;/* Event Filter */u_short flags;/* behavior ID */u_int fflags; /* filter id value */intptr_t data;/* filter data */void * udata;/* Application passthrough data */};

In a kqueue, {Ident, filter} identifies a unique event:

Ident
Event ID, which is generally set as a file descriptor.

Filter
You can regard the kqueue filter as an event. The kernel detects the status of the filter registered on the ident. If the status changes, it notifies the application. Kqueue defines many filters:

Filter related to socket read/write:

Evfilt_read: TCP listener socket. If data exists in the completed connection queue (the last ack of three handshakes received), this event will be notified. Applications that receive this notification generally call accept () and obtain the number of nodes that complete the queue through data. Stream or datagram socket. When the socket layer of the protocol stack receives data in the buffer zone, the event is notified and the data is set to the number of bytes of readable data.
Evfilt_writ: this event is notified when the write buffer in the socket layer can be written. data indicates how many bytes of free space the buffer currently has.

Flags:

Ev_add: Indicates adding an event to kqueue.
Ev_delete: indicates to remove incoming events from kqueue.

　Filter id value:

Ev_enable: the filter event is available. When registering an event, it is available by default.
Ev_disable: the filter event is unavailable. The application is not notified when the internal description is readable or writable.

Register an event to kqueue

bool Register(int kq, int fd) {       struct kevent changes[1];       EV_SET(&changes[0], fd, EVFILT_READ, EV_ADD, 0, 0, NULL);         int ret = kevent(kq, changes, 1, NULL, 0, NULL);         return true; }

Register registers FD to KQ. The registration method is to set eventlist and neventlist to null and 0 through kevent.

In general, people set socket Io to a non-blocking mode to improve the read/write performance, while avoiding the I/O read/write being accidentally locked. For some purpose, someone will use getsocketopt to peek at the data size of the socket read buffer or the size of the available space in the buffer zone. When kevent returns the result, the application is notified of the number of readable bytes or the size of writable space in the read/write buffer. Based on this feature, kqueue applications generally do not use non-blocking Io. During each read, data in the receiving buffer is read at one time based on the size of the readable bytes returned by kevent. when data is sent, it is also based on the size of the writable space in the write buffer returned by kevent, only data of space size can be written at a time.

 


The answer to epoll and select: https://www.zhihu.com/question/20122137

(Original address: https://www.cnblogs.com/FG123/p/5256553.html)

Supplement: Another good article:
1. What is event reuse technology?
Https://www.cnblogs.com/moonz-wu/p/4740908.html
Http://people.eecs.berkeley.edu /~ Sangjin/2012/12/21/epoll-vs-kqueue.html

[Switch] kqueue and epoll mechanisms

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More