Python Socket Programming IO Model Introduction (multiplexing *)

Source: Internet
Author: User

1.I/O Basic Knowledge 1.1 What is a file descriptor?

In a network, a socket object is a 1 file descriptor, and in a file, 1 file handles (that is, the files object) are 1 file descriptors. In fact, it can be understood as a "pointer" or "handle", pointing to 1 socket or file object, when the file or socket changes, the object corresponding to the document descriptor, will also change accordingly.

1.2 What is I/O

1, first understand what is I/O?

I/O (input/output), which is input/output. The operating system addresses the IO device and the IO device uses the address assigned by the operating system to process its own input and output information.

2. Commonly used I/O models

Blocking io:blocking io, non-blocking io:non-blocking io, synchronous io:synchronous io, asynchronous io:asynchronous IO

3. The objects and steps involved in IO occur. For a network IO (here we read for example), it involves two system objects, one that calls the IO process (or thread), and the other is the system kernel (kernel). When a read operation occurs, it goes through two stages:
(1) Wait for data preparation (waiting for the data is ready)----The kernel waits for a readable
(2) Copy the data from the kernel into the process (Copying the data from the kernel to the process)----Copy the kernel read
It is important to remember these two points because the difference between these IO model is that there are different situations in both phases.

1.3 What are I/O operations?
    • network Operation , that is, the establishment of the socket object, to establish a connection, send, receive, processing requests, responses, etc.
    • file Operations , that is, the creation of a file object, read and write files
    • terminal Operation . such as interactive input and output operations

Note: For Windows only supports socket operations, other systems support the above three I/O operations, but can not detect normal file operations, that is, automatic reading of ordinary files, monitoring files to see if changes.

2. Introduction to the Network IO model

Server-side programming often requires the construction of a high-performance IO model, with four of common IO models:

(1) Synchronous blocking IO (Blocking io): The traditional IO model. The user thread is blocked while the kernel is in IO operation.

As shown in 1, the user thread initiates an IO read operation through the system call to read, which is transferred from the user space to the kernel space. The kernel waits until the packet arrives, and then copies the received data to the user space to complete the read operation.

The pseudo-code description of the user thread using the synchronous blocking IO model is:

{Read (socket, buffer);p rocess (buffer);}

That is, the user waits until read reads the data from the socket into buffer before proceeding with the received data. During the entire IO request, the user thread is blocked, which causes the user to not do anything when initiating the IO request, and the CPU resource utilization is insufficient.

(2) Synchronous non-blocking IO (non-blocking io): The socket created by default is blocked and non-blocking IO requires the socket to be set to Nonblock. Note that the NIO mentioned here is not a Java NiO (New IO) library. This allows the user thread to return immediately after initiating an IO request.

2, because the socket is non-blocking, the user thread returns as soon as the IO request is initiated. However, no data is read, and the user thread needs to constantly initiate an IO request until the data arrives and the data is actually read to continue execution.

The pseudo-code description of the user thread using the synchronous nonblocking IO model is:

{while (read (socket, buffer)! = SUCCESS);p rocess (buffer);}

(3) IO multiplexing (IO multiplexing): The classic reactor design pattern, sometimes called asynchronous blocking Io,java selector and Epoll in Linux are this model. The IO multiplexing model is based on the multi-path separation function Select provided by the kernel, using the Select function to avoid polling waits in the synchronous nonblocking IO model.

As shown in 3, the user first adds a socket that requires IO operations to the Select and then blocks the wait for the select system call to return. When the data arrives, the socket is activated and the Select function returns. The user thread formally initiates a read request, reads the data, and continues execution.

From a process point of view, there is not much difference between IO requests and synchronous blocking models using the Select function, and even more than adding a monitor socket and calling the Select function for extra operations. However, the biggest advantage after using select is that the user can process multiple socket IO requests simultaneously in a single line range. The user can register multiple sockets, and then constantly call Select to read the activated socket, to achieve the same line range simultaneous processing of multiple IO requests. In the synchronous blocking model, it is necessary to achieve this goal through multithreading.

The pseudo-code description of the user thread using the Select function is:

{select (socket);  while (1= Select ();  for inch sockets) {if(can_read (socket)) {read (socket, buffer);p rocess (buffer); }}}

wherein the while loop is added to the Select monitor before the socket is called in while, the select gets the activated socket, and once the socket is readable, the read function is called to fetch the data from the socket.

However, the advantages of using the Select function are not limited to this. Although the above approach allows multiple IO requests to be processed within a single thread, the process of each IO request is still blocked (blocking on the Select function), and the average time is even longer than the synchronous blocking IO model. If the user thread registers only the socket or IO request that it is interested in, and then does its own thing and waits until the data arrives, it can increase the CPU utilization.

The IO multiplexing model implements this mechanism using the reactor design pattern.

As shown in 4, the EventHandler abstract class represents the Io event handler, which has an IO file handle handle (obtained through Get_handle), and handle_event (read/write, etc.) for the handle operation. Subclasses that inherit from EventHandler can customize the behavior of the event handler. The reactor class is used to manage EventHandler (registration, deletion, and so on) and uses handle_events to implement event loops, constantly invoking a multi-separator select for a synchronous event multiplexer (typically the kernel), as long as a file handle is activated (readable/writable, etc.). Select returns (blocks), and Handle_events invokes the handle_event of the event handler associated with the file handle.

As shown in 5, by reactor, the user thread can poll the IO operation state uniformly to the Handle_events event loop for processing. After the user thread registers the event handler, it can continue to do other work (async), while the reactor thread is responsible for invoking the kernel's Select function to check the socket status. When a socket is activated, it notifies the appropriate user thread (or executes a callback function of the user thread) to perform the work of handle_event for data reading and processing. Because the Select function is blocked, the multiplexed IO model is also known as the asynchronous blocking IO model. Note that the blocking mentioned here refers to the thread being blocked when the Select function executes, not the socket. In general, when using the IO multiplexing model, the socket is set to Nonblock, but this does not have an impact, because when a user initiates an IO request, the data has arrived and the user thread must not be blocked.

The pseudo-code description for the user thread using the IO multiplexing model is:

void Usereventhandler::handle_event () {if(can_read (socket)) {read (socket, buffer);p rocess ( buffer);}} {Reactor.register (new Usereventhandler (socket));}

The user needs to rewrite EventHandler's handle_event function to read the data and work on the data, and the user thread simply registers its eventhandler with the reactor. The pseudo-code for the Handle_events event loop in reactor is roughly the following.

reactor::handle_events () {while (1= Select ();  for inch sockets) {Get_event_handler (socket). Handle_event (); }}}

The event loop constantly calls select to get the socket being activated, and then according to the EventHandler that the socket corresponds to, the actuator handle_event function is available.

IO multiplexing is the most commonly used IO model, but it is not asynchronous enough to be "thorough" because it uses a select system call that blocks threads. So IO multiplexing can only be called asynchronous blocking Io, not true asynchronous IO.

(4) Asynchronous IO (asynchronous IO): The classic Proactor design pattern, also known as asynchronous nonblocking io.

"Real" asynchronous IO requires stronger support from the operating system. In the IO multiplexing model, the event loop notifies the user thread of the status of the file handle, and the user thread reads the data and processes the data itself. In the asynchronous IO model, when the user thread receives the notification, the data has been read by the kernel and placed in the buffer area specified by the user thread, and the kernel notifies the user thread to use it directly after the IO is completed.

The asynchronous IO model implements this mechanism using the proactor design pattern.

6,proactor mode and reactor mode are structurally similar, but differ greatly in user (client) usage. In reactor mode, the user thread listens to an event that is interested in registering the reactor object, and then invokes the event handler when the event fires. In Proactor mode, the user thread will asynchronousoperation (read/write, etc.), Proactor and Completionhandler registered to Asynchronousoperationprocessor when the operation is complete. Asynchronousoperationprocessor uses facade mode to provide a set of asynchronous operations APIs (read/write, etc.) for use by the user, and when the user thread invokes the asynchronous API, it continues to perform its own task. Asynchronousoperationprocessor will open a separate kernel thread to perform asynchronous operations, enabling true asynchrony. When the asynchronous IO operation is complete, Asynchronousoperationprocessor takes the user thread out of Proactor and Completionhandler registered with Asynchronousoperation, The Completionhandler is then forwarded along with the result data of the IO operation to the proactor,proactor responsible for callback event completion handler handle_event for each asynchronous operation. Although each asynchronous operation in the Proactor mode can bind to a Proactor object, typically in the operating system, Proactor is implemented as singleton mode to facilitate centralized distribution operation completion events.

7, in the asynchronous IO model, the user thread directly initiates a read request using the asynchronous IO API provided by the kernel, and returns immediately after initiating, continuing the execution of the user thread code. However, at this point the user thread has registered the called Asynchronousoperation and Completionhandler to the kernel, and then the operating system opens a separate kernel thread to handle IO operations. When the data of the read request arrives, the kernel is responsible for reading the data in the socket and writing to the user-specified buffer. The final kernel sends the read data and the user thread registration Completionhandler to the internal Proactor,proactor to notify the user thread of the completion of the IO (typically by invoking the completion event handler that is registered by the user thread) to complete the asynchronous IO.

The pseudo-code description of the user thread using the asynchronous IO model is:

void Usercompletionhandler::handle_event (buffer) {process (buffer);} {Aio_read (socket, new Usercompletionhandler);}

The user needs to rewrite Completionhandler's handle_event function for working with the data, and the parameter buffer represents the data that Proactor has prepared, and the user thread directly invokes the asynchronous IO API provided by the kernel. And the rewritten Completionhandler can be registered.

Compared to the IO Multiplexing model, asynchronous IO is not very common, and many high performance concurrent service programs use the IO multiplexing model + multi-threaded task processing architecture to meet the requirements. In addition, the current operating system support for asynchronous IO is not particularly perfect, more is the use of the IO Multiplexing model to simulate asynchronous IO (IO event triggering does not directly notify the user thread, but the data is read and written to the user-specified buffer). Asynchronous IO has been supported after Java7, and interested readers can try it.

This paper briefly describes the structure and principles of four high-performance IO models from the basic concepts, workflow and code example three levels, and makes clear the confusing concepts of synchronous, asynchronous, blocking and non-blocking. Through the understanding of high-performance IO model, we can choose the IO model which is more in line with the actual business characteristics in the development of the service-side program, and improve the service quality. I hope this article will be of some help to you.

The concept of synchronous and asynchronous describes how user threads interact with the kernel: synchronization means that the user thread initiates an IO request and waits or polls the kernel IO operation to complete before continuing execution, while asynchronous means that the user thread continues execution after the IO request is initiated, and notifies the user thread when the kernel IO operation is complete, or invokes a callback function registered by the user thread. The blocking and non-blocking concepts describe how the user thread invokes kernel IO operations: Blocking means that the IO operation needs to be completely completed before it returns to the user space, and non-blocking means that the IO operation is returned to the user immediately after it is called, without waiting for the IO operation to complete completely. In addition, Richard Stevens's signal-driven IO (Signal driven IO) model, mentioned in the Unix Network programming Volume 1, is not covered in this article because the model is not commonly used. Next, four common IO models are introduced.

Resources:

http://blog.csdn.net/historyasamirror/article/details/5778378

Http://www.cnblogs.com/fanzhidongyzby/p/4098546.html

Python Socket Programming IO Model Introduction (multiplexing *)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.