Five kinds of IO models in Linux are introduced in detail _linux

Source: Internet
Author: User
Tags epoll posix sleep

Concept Description

User space and kernel space

Now that the operating system uses virtual memory, for 32-bit operating systems, its addressing space (virtual storage space) is 4G (2 of 32). The core of the operating system is the kernel, independent of ordinary applications, access to protected memory space, and all permissions to access the underlying hardware devices. In order to ensure the user process can not directly operate the kernel (kernel), to ensure the security of the kernel, the operating system divides the virtual space into two parts, part of the kernel space, part of the user space. For the Linux operating system, the highest 1G bytes (from the virtual address 0xc0000000 to 0xFFFFFFFF) for the kernel to use, called kernel space, and lower 3G bytes (from the virtual address 0x00000000 to 0xBFFFFFFF) for each process to use, Called User space.

Process switching

To control the execution of a process, the kernel must have the ability to suspend processes running on the CPU and restore the execution of a previously suspended process. This behavior is referred to as process switching. So it can be said that any process is run under the support of the operating system kernel and is closely related to the kernel.

To run from one process to another, this process passes through the following changes:

    1. Save the processor context, including program counters and other registers.
    2. Update PCB information.
    3. Move the PCB of the process into the appropriate queue, such as ready, blocking in an event. Select another process to execute and update its PCB.
    4. Update the memory management data structure.
    5. Restore the processor context.

Blocking of the process

The process being executed, because some of the expected events did not occur, such as the request for system resources failed, waiting for the completion of an operation, new data has not yet arrived or new work to do, and so on, the system will automatically perform blocking primitives (block), so that their own from the run state into a blocking As a result, a process blocking is an active behavior of the process itself, and therefore only a running process (obtaining a CPU) may turn it into a blocking state. When a process enters a blocking state, it does not consume CPU resources.

File descriptor

A document descriptor (file descriptor) is a term in computer science and is an abstract concept that describes a reference to a file.

The file descriptor is formally a non-negative integer. In fact, it is an index value that points to the record table that the kernel opens files for the process that each process maintains. When a program opens an existing file or creates a new file, the kernel returns a file descriptor to the process. In programming, some programs that involve the bottom layer are often developed around the file descriptor. But the concept of file descriptors is often applied only to operating systems such as UNIX and Linux.

Cache IO

Cache Io is also known as standard IO, and most file system default IO operations are cached IO. In the Linux cache IO mechanism, the operating system caches IO data in the file system's page cache, that is, the data is copied to the operating system kernel buffer before it is copied from the OS kernel buffers to the application's address space.

Disadvantages of caching IO:

Data copy operations in the application address space and kernel are required during the transfer process, and the CPU and memory overhead of these data copying operations is very large.

Synchronous and asynchronous & blocking and Non-blocking

When doing network programming, we often see synchronous (Sync)/asynchronous (Async), blocking (block)/non-blocking (Unblock) Four kinds of invocation methods, first understand some conceptual things.

1. Synchronous and asynchronous

Synchronous and asynchronous synchronization and Asynchrony are concerned with the message communication mechanism (synchronous communication/asynchronous communication) called synchronization, which is not returned until a call is made and the result is not obtained. But once the call returns, you get the return value. In other words, it is the caller who actively waits for the result of the call.

Instead, the call is returned directly after the call is sent, so no results are returned. In other words, when an asynchronous procedure call is issued, the caller does not immediately get the result. Instead, the caller notifies the caller through state, notification, or through a callback function after the call is made.

Typical asynchronous programming models such as Node.js.

2016.4.17 Update:

POSIX defines the two terms:

Synchronous I/O operations: Causes the request process to block until I/O operation completes

asynchronous I/O operations: does not cause the request process to block

2. Blocking and Non-blocking

Blocking and non-blocking concerns the state of the program as it waits for the call result (message, return value).

A blocking call is the current thread is suspended until the call result returns. The calling thread returns only after the result is obtained. A non-blocking call is a call that does not block the current thread until the result is immediately available.

About blocking/non-blocking & synchronous/Asynchronous more figurative metaphors

Lao Zhang Love tea, nonsense do not say, boiled water. Appearance: Old Zhang, Kettle Two (ordinary kettle, short kettle, the kettle will ring, referred to as kettle).

1. Lao Zhang put the kettle on the fire and erecting the water open. (synchronized blocking) Lao Zhang thinks he's a little silly.

2. Lao Zhang put the kettle on the fire, to the living room to watch TV, from time to time to the kitchen to see if the water is open. (Synchronous non-blocking) Lao Zhang still feel a little silly, and then become high-end, bought the kettle that will ring the flute. When the water is open, the noise can be emitted loudly.

3. Lao Zhang put the kettle on the fire and erecting the water open. (asynchronous blocking) Lao Zhang thinks it's not very important to be silly.

4. Lao Zhang put the kettle on the fire, go to the living room to watch TV, before the kettle rang no longer to see it, rang again to get a pot. (asynchronous non-blocking) Lao Zhang thinks he's smart.

The so-called synchronous asynchronous, just for the kettle. Common kettle, synchronous, ringing kettle, asynchronous. Although all can work, but the kettle can be completed on their own, prompted the old Zhang Water opened. This is not a common kettle. Synchronization can only allow callers to poll themselves (in case 2), resulting in poor old-Zhang efficiency.

The so-called obstruction is not blocked, only for old Zhang. The old Zhang of the erecting, blocking, seeing the old Zhang, not blocking. Situation 1 and situation 3 old Zhang is blocked, daughter-in-law shout he didn't know. Although the 3 ring kettle is asynchronous, but for the old Zhang erecting not much significance. Therefore, the general asynchronous is used in conjunction with non-blocking, so as to play an asynchronous effect.

Five kinds of IO models under Linux

    1. Blocking io (blocking IO)
    2. Non-blocking IO (nonblocking io)
    3. Io multiplexing (SELECT and poll) (IO multiplexing)
    4. Signal-driven IO (signal driven IO (Sigio))
    5. Asynchronous IO (Asynchronous IO (the POSIX aio_functions))

The first four are synchronized, and only the last one is asynchronous IO.

Blocking IO Model

In this model, the application (application), in order to perform this read operation, invokes the corresponding system call, handing control to kernel and then waiting (which is actually blocked). Kernel starts executing this system call, returns the response to the application after execution, and the application gets a response, stops blocking, and performs the work that follows.

Non-blocking IO

Under Linux, an application can return immediately by setting the property O_nonblock,io of a file descriptor, but does not guarantee that the IO operation will succeed. That is, when the application is set to O_nonblock, the write operation is invoked and the corresponding system call is called, which is returned immediately from the kernel. But at this point in time of return, the data may not have been actually written to the specified location. That is, kernel simply returns to this system call (the application will not be blocking by this IO operation only if it returns immediately), but what the system call specifically performs (writes the data) may not be completed. As for the application, although this IO operation quickly returned, but it does not know whether this IO operation is really successful, in order to know whether the IO operation is successful, there are generally two strategies: first, the application needs to actively loop to ask kernel (this method is synchronous non-blocking io) The second is to use IO notification mechanism, such as: Io multiplexing (this method belongs to asynchronous blocking IO) or signal-driven IO (this method belongs to asynchronous non-blocking io).

IO multiplexing (asynchronous blocking IO)

As before, the application performs a read operation, so calls a system call, which is passed to the kernel. But on the application side, it calls system call and does not wait for the return result of the kernel but returns immediately, although the call function returned immediately is an asynchronous way, but the application is like a select (), Functions with multiple file descriptors, such as poll and Epoll, are blocked until the system call has a result returned and then notifies the application. In other words, "in this model, IO functions are non-blocking, using blocking Select, poll, and Epoll system calls to determine when one or more IO descriptors can be manipulated." "So, from the actual effect of the IO operation, the asynchronous blocking IO is the same as the first synchronous blocking IO, and the application waits until the IO operation succeeds (the data has been written or read) before starting the following work. The difference is that asynchronous blocking IO uses a SELECT function to provide notifications for multiple descriptors, increasing concurrency. As an example: if there are 10,000 concurrent read requests, but there is still no data on the network, the 10,000 read will be blocked at the same time, now use a function such as SELECT, poll, Epoll to specifically block the state of listening to the 10,000 requests, once the data arrives, it is responsible for notifying , so that the previous 10,000 of each other's waiting and blocking into a special function to be responsible for and management. At the same time, the difference between asynchronous blocking IO and the second kind of non-blocking Io is that synchronous non-blocking IO requires an application to actively cycle to ask if there is operational data available, Asynchronous blocking Io, by using IO multiplexing functions such as select and poll, simultaneously detects multiple event handlers to tell the application whether it can have data operations.

Signal-driven IO (signal driven IO (Sigio))

The application submits system call to the read request, and then kernel begins processing the corresponding IO operation, while the application does not wait for the kernel to return the response, and begins to perform other processing operations (the application is not blocked by IO operations). When kernel executes, returning Read's response, a signal is generated or a thread-based callback function is executed to complete the IO process.

In theory, blocking io, IO multiplexing, and signal-driven IO are all synchronous IO models. Because of the three models, the IO read-write operation is done by the application after the IO event occurs. The asynchronous IO model defined by the POSIX specification is different. For asynchronous IO, users can read and write directly to Io, which tells the kernel user where to read and write the buffer, and how the kernel notifies the application after the IO operation completes. Asynchronous IO reads and writes are always returned immediately, regardless of whether Io is blocked, because the Allah read-write operation has been taken over by the kernel. That is, the synchronous IO model requires that the user code perform IO operations (read data from the kernel buffer into the user buffer or write data from the user buffer to the kernel buffer), whereas the asynchronous IO mechanism is performed by the kernel for IO operations ( The movement of data between kernel buffers and user buffers is done by the kernel in the background. As you can see, synchronous IO notifies the application of IO-ready events, while asynchronous IO notifies the application of IO completion events. In a Linux environment, the functions defined in the AIO.H header file provide support for asynchronous IO.

Asynchronous IO (Asynchronous IO (the POSIX aio_functions))

Asynchronous IO is the same as the above asynchronous concept, when an asynchronous procedure call is issued, the caller can not immediately get the result, the actual processing of the call function after completion, through the state, notification and callback to notify the caller's input and output operations. The working mechanism of asynchronous IO is: Tell the kernel to start an operation and let the kernel notify us after the entire operation is complete, this model differs from the signal-driven IO in that the kernel informs us when we can start an IO operation, which is implemented by user-defined signal functions. The asynchronous IO model is the kernel that tells us when the IO operation is complete. To implement asynchronous Io, a set of APIs that begin with AIO are specifically defined, such as Aio_read.

Summary: The first four models – blocking IO, non-blocking io, multiplexing io, and signal-driven IO all belong to the synchronous mode, because the true IO operation (function) will block the process, only the asynchronous IO model truly realizes the asynchrony of IO operations.

IO multiplexing

In order to explain this noun, first of all to understand the concept of reuse, which is common meaning, so that the understanding is still somewhat abstract, for this reason, we understand the use of multiplexing in the field of communication, in the field of communication in order to fully utilize the network connectivity of the physical media, Often in the same network link using time Division multiplexing or frequency division multiplexing technology to make it in the same link transmission multi-channel signal, here we basically understand the meaning of reuse, that is, a common "media" to do as much as possible to do the same kind of thing, that IO multiplexing "media" is what? To do this we first look at the server programming model, the client sends a request server to produce a process to service it, each time a customer request produces a process to serve, but the process is not infinite system generation, so in order to solve a large number of client access problems, the introduction of IO multiplexing technology, That is, a process can service multiple customer requests at the same time. That is, the "media" for IO reuse is a process (the exact reuse is select and poll, because processes are also implemented by calling Select and poll), multiplexing a process (select and poll) to service multiple IO, Although the IO that the client sends is concurrent, the read and write data required for IO is mostly not ready, thus, a function (select and poll) can be used to listen for the state of the data required by IO, and the process will serve such IO once the IO has data to read and write.

After the understanding of IO multiplexing, we are looking at implementation of IO multiplexing three APIs (select, poll and Epoll) of the differences and connections, select,poll,epoll are IO multiplexing mechanism, IO multiplexing is through a mechanism, can monitor multiple descriptors, Once a descriptor is ready (typically read-ready or write-ready), it can notify the application to read and write accordingly. But select,poll,epoll are essentially synchronous IO, since they all need to read and write when the event is ready, which means that the read-write process is blocked, while asynchronous IO is not responsible for reading and writing, and the implementation of asynchronous IO is responsible for copying the data from the kernel to the user space. The prototypes of the three are as follows:

    1. int select (int Nfds, fd_set *readfds, Fd_set *writefds, Fd_set *exceptfds, struct timeval);
    2. int poll (struct POLLFD *fds, nfds_t nfds, int timeout);
    3. int epoll_wait (int epfd, struct epoll_event *events, int maxevents, int timeout);

Select

The first parameter of the Select, Nfds, is the maximum descriptor value in the Fdset set plus 1,fdset is a bit array with a size limit of __fd_setsize (1024), and each bit of the bit array represents whether the corresponding descriptor needs to be checked. The No. 234 parameter represents an array of file descriptor bits that require attention to read, write, and error events, both as input parameters and as output parameters, which may be modified by the kernel to indicate which of the descriptors are of concern, so the fdset will need to be reinitialized each time a select is invoked. The timeout parameter is a timeout, and the structure is modified by the kernel, and the value is the time remaining for the timeout period.

The calling steps for the select are as follows:

    1. Using Copy_from_user to copy fdset from user space to kernel space
    2. Register callback function __pollwait
    3. Iterate through all FD, calling its corresponding poll method (for sockets, this poll method is sock_poll,sock_poll called to Tcp_poll,udp_poll or Datagram_poll depending on the situation)
    4. Taking Tcp_poll as an example, its core implementation is __pollwait, the callback function registered above.
    5. The main job of __pollwait is to hang current (currently process) in the waiting queue of the device, different devices have different waiting queues, and for Tcp_poll, the wait queue is Sk->sk_ Sleep (note that suspending a process to the wait queue does not mean that the process is asleep). When the device receives a message (a network device) or fills out the file data (a disk device), it wakes up the device waiting for the process to sleep on the queue, and then the current is awakened.
    6. The poll method returns a mask mask that describes whether a read-write operation is ready, and assigns a value to fd_set based on the mask mask.
    7. If all of the FD is traversed and a mask mask is not returned, then Schedule_timeout is invoked to call the Select's process (that is, current) to sleep. When a device driver takes its own resource to read and write, it wakes up its process of waiting for the queue to sleep. If there is more than a certain timeout (schedule_timeout specified), or no one wakes, the process that invokes the select will be awakened to the CPU again, and then iterate through the FD to determine if there is a ready fd.
    8. Copy fd_set from kernel space to user space.

A summary of the major disadvantages of select:

(1) Every time you call Select, you need to copy the FD collection from the user state to the kernel state, which is very large in FD (2) at the same time, each call to the select requires the kernel to traverse through all the FD passed in, which is also a large cost in FD. (3) The number of file descriptors supported by Select is too small, the default is 1024

Poll

Poll, unlike Select, passes a POLLFD array to the kernel to convey the events that need attention, so there is no limit to the number of descriptors, and the events field in POLLFD and revents are used to indicate the events and events that occur. So the POLLFD array only needs to be initialized once.

The implementation mechanism of poll is similar to select, which corresponds to the sys_poll in the kernel, except that poll passes POLLFD array to the kernel, then POLLFD each descriptor in poll, which is more efficient than fdset. After the poll returns, it is necessary to check the revents value of each element in the POLLFD to refer to whether the event occurred.

Epoll

It was not until Linux2.6 that the implementation method that was directly supported by the kernel was epoll, which was recognized as the best multi-channel IO Ready notification method for Linux2.6. Epoll can support both horizontal and edge triggers (edge triggered, which only tells the process which file descriptor has just become ready, it only says once, if we don't act, then it will not be notified again, this way is called the Edge Trigger), In theory, EDGE triggers have a higher performance, but the code implementation is quite complex. Epoll also tells only those file descriptors that are ready, and when we call Epoll_wait () to get the ready file descriptor, the return is not the actual descriptor, but a value representing the number of the ready descriptor. All you have to do is go to epoll the specified number of file descriptors in one array, and the memory Mapping (MMAP) technique is used here, thus eliminating the overhead of these file descriptors being replicated during system calls. Another essential improvement is that Epoll is based on the event-ready notification method. In Select/poll, the kernel scans all monitored file descriptors only after a certain method has been invoked, and Epoll registers a file descriptor with Epoll_ctl (), once the file descriptor is ready, The kernel uses a similar callback callback mechanism to quickly activate the file descriptor, which is notified when the process calls Epoll_wait ().

Epoll, since it is an improvement on select and poll, should be able to avoid the above three shortcomings. How did the Epoll solve it? Before that, let's look at the differences between the Epoll and select and poll invocation interfaces, and the select and poll provide only one function--select or poll function. While Epoll provides three functions, Epoll_create,epoll_ctl and epoll_wait,epoll_create are creating a epoll handle; Epoll_ctl is registering the type of event to listen for; Epoll_ Wait is the creation of waiting events.

For the first drawback, the Epoll solution is in the Epoll_ctl function. Each time a new event is registered in the Epoll handle (specify Epoll_ctl_add in Epoll_ctl), all the FD is copied into the kernel instead of duplicated at epoll_wait. Epoll guarantees that each FD will only be copied once throughout the process.

For the second disadvantage, the Epoll solution does not like Select or poll each time turns current into the device waiting queue for FD, but only when the current is epoll_ctl (this is necessary) and specifies a callback function for each FD. When the device is ready to wake up waiting on the queue, the callback function is called, and the callback function adds the ready FD to a ready list. Epoll_wait's job is actually to see if there is a ready FD in this ready list (using Schedule_timeout () to get a nap, to judge the effect of a meeting, and the 7th step in the Select implementation is similar).

For the third disadvantage, Epoll does not have this restriction, it supports the maximum number of FD can open file, which is generally far greater than 2048, for example, in 1GB memory machine about 100,000, the specific number can be cat/proc/sys/fs/ File-max, this number is generally related to system memory.

Summarize

(1) The Select,poll implementation needs to poll all FD collections on its own, until the device is ready, during which it may sleep and wake up multiple times. And Epoll in fact also need to call epoll_wait constantly polling ready linked list, may also multiple sleep and wake alternately, but it is when the device is ready, call the callback function, put the ready FD into the Ready list, and wake up in the epoll_wait into the sleep process. Although all want to sleep and alternate, but select and poll in the "Awake" time to traverse the entire FD set, and Epoll "awake" as long as to determine whether the ready linked list is empty, which saves a lot of CPU time, this is the callback mechanism brought about by the performance improvement.

(2) Select,poll each call to the FD collection from the user state to the kernel of the copy once, and the current to the device waiting queue to hang once, and epoll only once copy, and the current to wait for the queue to hang only once (in Epoll_ Wait for the beginning, note that the waiting queue here is not a device waiting queue, but a epoll internal definition of waiting queues, which can save a lot of overhead.

Thank you for reading, I hope to help you, thank you for your support for this site!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.