"Python"--event-driven introduction, blocking IO, nonblocking io, synchronous io, asynchronous IO Introduction

Last Update:2017-10-17 Source: Internet

Author: User

Tags epoll

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Event-driven introduction in general, when we write a server process model, there are several models: (1) Each request is received, a new process is created to process the request, (2) Each request is received, a new thread is created to process the request, and (3) each request is placed into a list of events, There are several ways in which the main process can handle requests through non-blocking I/O, and the method in paragraph (1), due to the large cost of creating new processes, can result in poor server performance, but the implementation is relatively simple. (2), due to the synchronization of threads, it is possible to face deadlock and other problems. In the form of (3), the logic is more complex than the previous two when writing the application code. Comprehensive consideration of various factors, it is generally accepted that (3) is the way most Web servers adopt two, the event-driven model in UI programming, often to mouse click on the corresponding, first how to get mouse click?
Mode one: Create a thread that has been circulating to detect whether there is a mouse click, then this method has the following disadvantages：
1. CPU waste, may be mouse click frequency is very small, but the scan thread will still be cyclic detection, which will cause a lot of CPU waste, if the scan mouse click on the interface is blocked?
2. If it is blocked, and will appear the following problems, if we not only to scan the mouse click, but also to scan the keyboard is pressed, because scanning the mouse is blocked, then may never go to scan the keyboard;
3. If a cycle needs to scan a lot of devices, which will lead to response time problems;
So, the way is very bad.

mode Two: is the event-driven model
Most of the current UI programming is an event-driven model, as many UI platforms provide the OnClick () event, which represents the mouse down event. The event-driven model is broadly thought of as follows:
1. There is an event (message) queue;
2. When the mouse is pressed, add a click event (message) to this queue;
3. There is a loop that continuously extracts events from the queue, depending on the event, calling different functions, such as onclick (), OnKeyDown (), etc.;
4. Events (messages) generally hold their own handler pointers, so that each message has its own processing function;

Event-driven programming is a programming paradigm where the execution flow of a program is determined by an external event. It is characterized by the inclusion of an event loop that uses a callback mechanism to trigger the corresponding processing when an external event occurs. Two other common programming paradigms are (single-threaded) synchronization and multithreaded programming.

Let's use examples to compare and contrast single-threaded, multithreaded, and event-driven programming models. Shows the work done by the programs in these three modes over time. This program has 3 tasks to complete, each of which blocks itself while waiting for an I/O operation. The time it takes to block the I/O operation has been marked with a gray box.

In the single-threaded synchronization model, the tasks are executed in order. If a task is blocked by I/O, all other tasks must wait until it is complete before they can execute sequentially. This explicit execution sequence and serialization processing behavior is easily inferred. If there is no interdependent relationship between tasks, but still need to wait for each other, this makes the program unnecessary to reduce the speed of operation.

In a multithreaded version, these 3 tasks are executed separately in separate threads. These threads are managed by the operating system, can be processed in parallel on multiprocessor systems, or interleaved on a single-processor system. This allows other threads to continue executing while a thread is blocking a resource. This is more efficient than synchronizing a similar function, but programmers must write code to protect shared resources from being accessed by multiple threads at the same time. Multithreaded programs are more difficult to infer because such programs have to handle thread-safety issues through thread synchronization mechanisms such as locks, reentrant functions, thread-local storage, or other mechanisms, which can lead to subtle and painful bugs if implemented improperly.

In the event-driven version of the program, 3 tasks are interleaved, but still in a separate line-controlled system. When processing I/O or other expensive operations, register a callback into the event loop and continue execution when I/O operations are complete. The callback describes how to handle an event. The event loop polls all events and assigns them to the callback function that waits for the event to be processed when the event arrives. This approach allows the program to execute as much as possible without the need for additional threads. Event-driven programs are more likely to infer behavior than multithreaded applications because programmers do not need to be concerned about thread safety issues.

The event-driven model is usually a good choice when faced with the following environments:

There are many tasks in the program, and ...
The tasks are highly independent (so they don't need to communicate with each other, or wait for each other) and ...
Some tasks are blocked while waiting for an event to arrive.

This is also a good choice when applications need to share variable data between tasks, as there is no need for synchronous processing.

Network applications often have these characteristics, which makes them well suited to the event-driven programming model.

The above event-driven model, as long as an IO to register an event, and then the main program can continue to do other things, only to the completion of IO processing, continue to restore the previously interrupted tasks, how is this essentially implemented?

Logic diagram:

Blocking IO, non-blocking IO, synchronous IO, asynchronous IO Introduction

For an IO access (read example), the data is copied into the buffer of the operating system kernel before it is copied from the operating system kernel buffer to the application's address space. So, when a read operation occurs, it goes through two stages:
1. Wait for data preparation (waiting for the
2. Copying data from the kernel to the process (Copying the data from the kernel to the)

Formally because of these two phases, the Linux system produces the following five kinds of network mode scheme.
-Blocking I/O (blocking IO)
-Non-blocking I/O (nonblocking IO)
-I/O multiplexing (IO multiplexing)
-Signal-driven I/O (signal driven IO)
-Asynchronous I/O (asynchronous IO)

Note: Since signal driven IO is not commonly used in practice, it only mentions the remaining four IO Model.

1. Concept Description

1.1. User space and kernel space

Now that the operating system is using virtual memory, the 32-bit operating system, its addressing space (virtual storage space) is 4G (2 of 32). The core of the operating system is the kernel, which is independent of the normal application, has access to protected memory space, and has all the permissions to access the underlying hardware device. In order to ensure that the user process can not directly manipulate the kernel (kernel), to ensure the security of the kernel, worry about the system to divide the virtual space into two parts, part of the kernel space, part of the user space. For the Linux operating system, the highest 1G bytes (from the virtual address 0xc0000000 to 0xFFFFFFFF) for the kernel to use, called the kernel space, and the lower 3G bytes (from the virtual address 0x00000000 to 0xBFFFFFFF) for each process to use, Called User space.

1.2. Process switching

To control the execution of a process, the kernel must have the ability to suspend a process that is running on the CPU and resume execution of a previously suspended process. This behavior is called process switching. So it can be said that any process that runs under the support of the operating system kernel is closely related to the kernel.

The process of moving from one process to another runs through the following changes:
1. Save the processor context, including program counters and other registers.
2. Update the PCB information.

3. Move the PCB of the process into the appropriate queue, such as ready, in an event blocking queue.
4. Select another process to execute and update its PCB.
5. Update the data structure of the memory management.
6. Restore the processing machine context.

In short, it is very resource-intensive, specific can refer to this article: process switching

Note: The Process Control block (processing control blocks) is a data structure in the core of the operating system that mainly represents the state of the process. The purpose of this is to make a program (with data) that cannot be run independently in a multi-channel program environment, to be a basic unit that can run independently, or a process that executes concurrently with other processes. In other words, the OS is based on the PCB to control and manage the concurrent execution of the process. The PCB is usually a contiguous storage area in the system memory footprint, which holds all the information the operating system needs to describe the process and control the process.

1.3. Process Blocking

The executing process, because some expected events did not occur, such as requesting system resources failed, waiting for the completion of an operation, new data has not arrived or no new work to do, etc., the system automatically executes the blocking primitive (block), making itself from the running state into a blocking state. It can be seen that the blocking of a process is an active behavior of the process itself, and therefore it is possible to turn it into a blocking state only if the process is in a running state (acquiring the CPU). 当进程进入阻塞状态，是不占用CPU资源的.

1.4. File Descriptor FD

File descriptor, a term in computer science, is an abstraction that describes a reference to a file.

The file descriptor is formally a non-negative integer. In fact, it is an index value that points to the record table in which the kernel opens a file for each process maintained by the process. When a program opens an existing file or creates a new file, the kernel returns a file descriptor to the process. In programming, some of the underlying programming often revolves around file descriptors. However, the concept of file descriptors is often applied only to operating systems such as UNIX and Linux.

1.5. Cache I/O

Cache I/O is also known as standard I/O, and most file system default I/O operations are cache I/O. In the Linux cache I/O mechanism, the operating system caches the I/O data in the file system's page cache, which means that the data is copied into the buffer of the operating system kernel before it is copied from the operating system kernel buffer to the application's address space.

Disadvantages of Cache I/O:
Data is required to perform multiple copies of data in the application address space and the kernel during transmission, and the CPU and memory overhead of these data copy operations is very large.

Second, blocking I/O (blocking IO)

In Linux, all sockets are blocking by default, and a typical read operation flow is probably this:

When the user process invokes the Recvfrom system call, the kernel (kernel) begins the first phase of IO: Preparing the data (for network IO, many times the data has not arrived at the beginning.) For example, you have not received a full UDP packet. This time kernel will have to wait for enough data to arrive. This process needs to wait, which means that the data is copied into the buffer of the operating system kernel, which requires a process. On this side of the user process, the entire process is blocked (of course, by the process's own choice of blocking). When kernel waits until the data is ready, it copies the data from the kernel to the user's memory, and then kernel returns the result, and the user process removes the block state and re-runs it.

Therefore, the blocking IO is characterized by block in both phases of IO execution .

Three, non-blocking I/O (nonblocking IO)

Under Linux, you can make it non-blocking by setting the socket. When you perform a read operation on a non-blocking socket, the process looks like this:

When the user process issues a read operation, if the data in the kernel (kernel) is not ready, it does not block the user process, but returns an error immediately. From the user process point of view, it initiates a read operation and does not need to wait, but immediately gets a result. When the user process determines that the result is an error, it knows that the data is not ready, so it can send the read operation again. Once the data in the kernel is ready and again receives the system call of the user process, it immediately copies the data to the user's memory and then returns.

Therefore, nonblocking IO is characterized by the user process needs to constantly proactively ask kernel data well no.

Iv. I/O multiplexing (IO multiplexing)

Io Multiplexing is what we call Select,poll,epoll, and in some places this IO mode is the event driven IO. The benefit of Select/epoll is that a single process can simultaneously handle multiple network connections of IO. The basic principle of the select,poll,epoll is that the function will constantly poll all sockets that are responsible, and when a socket has data arrives, notifies the user of the process.

当用户进程调用了select，那么整个进程会被block, and at the same time, kernel will "monitor" all the select-responsible sockets, and when the data in any one socket is ready, select will return. This time the user process then invokes the read operation, copying the data from the kernel (kernel) to the user process.

This figure is not much different from the blocking IO diagram, in fact, it's even worse. Because two system calls (select and Recvfrom) are required, blocking IO only invokes one system call (Recvfrom). However, the advantage of using select is that it can handle multiple connection at the same time.

Therefore, if the number of connections processed is not high, Web server using Select/epoll does not necessarily perform better than the Web server using multi-threading + blocking IO, and may be more delayed. The advantage of Select/epoll is not that a single connection can be processed faster, but that it can handle more connections. ）

In the IO multiplexing model, the actual, for each socket, is generally set to become non-blocking, but, as shown, the entire user's process is actually always block. Only the process is the block of the Select function, not the socket IO.

Therefore, I/O multiplexing is characterized by a mechanism in which a process can wait for multiple file descriptors at the same time, and any one of these file descriptors (socket descriptors) goes into a read-ready state, and the Select () function can be returned.

V. Asynchronous I/O (asynchronous IO)

The asynchronous IO under Linux is actually used very little. Let's take a look at its process:

After the user process initiates the read operation, you can begin to do other things immediately. On the other hand, from the perspective of kernel, when it receives a asynchronous read, first it returns immediately, so no block is generated for the user process. The kernel (kernel) then waits for the data to be ready and then copies the data to the user's memory, and when all this is done, kernel sends a signal to the user process to tell it that the read operation is complete.

Vi. Summary: 1, the difference between blocking and non-blocking

Calling blocking IO will block the corresponding process until the operation is complete, and non-blocking IO will return immediately if the kernel (kernel) prepares the data.

2. The difference between synchronous IO and asynchronous IO

The difference is that synchronous IO will block the process when it does "IO operation". According to this definition, the blocking io,non-blocking Io,io Multiplexing described previously are synchronous IO.

Some people will say, non-blocking io is not block AH. Here is a very "tricky" place, defined in the "IO operation" refers to the real IO operation, is the example of recvfrom this system call. Non-blocking IO does not block the process when it executes recvfrom this system call if the kernel data is not ready. However, when the data in the kernel is ready, recvfrom copies the data from the kernel to the user's memory, at which point the process is blocked, during which time the process is block.

The asynchronous IO is not the same, and when the process initiates an IO operation, the direct return is ignored until the kernel sends a signal telling the process that IO is complete. Throughout this process, the process has not been blocked at all.

Comparison of each IO model:

"Python"--event-driven introduction, blocking IO, nonblocking io, synchronous io, asynchronous IO Introduction

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More