Python learning: IO model of concurrent programming

Source: Internet
Author: User

The content of this section:
    1. Introduction to I/O Models
    2. Blocking I/O (blocking IO)
    3. non-blocking I/O (non-blocking IO)
    4. multiplexed I/O (IO Multiplexing)
    5. Asynchronous I/O (asynchronous I/o)
    6. IO Model comparison and analysis
    7. Selectors module
Introduction to an IO model

What is the difference between synchronous (synchronous) IO and asynchronous (asynchronous) io, what is blocking (blocking) io and non-blocking (non-blocking) io respectively? The problem is that different people may give different answers, such as wikis, that asynchronous IO and non-blocking io are a thing. This is because different people have different backgrounds, and the context is not the same when discussing this Issue. therefore, in order to better answer this question, I first limit the context of this Article.

This article discusses the background of network IO in a Linux Environment. The most important reference in this document is Richard Stevens's UNIX? Network programming Volume 1, Third edition:the Sockets Networking ", 6.2 section" I/O Models ", Stevens in this section detailing the features and differences of various io, if English is good enough , it is recommended to read directly. The style of Stevens is famous, so don't worry about it. The flowchart in this paper is also intercepted from the reference literature.

Stevens compared five IO Model in the article:
* Blocking IO
* nonblocking IO
* IO Multiplexing
* Signal Driven IO
* Asynchronous IO
By signal driven IO (signal driven io) is not commonly used in practice, so the main introduction of the remaining four IO Model.

again, the objects and steps involved in the IO Occur. For a network IO (here we read for example), it involves two system objects, one that calls the IO process (or thread), and the other is the system kernel (kernel). When a read operation occurs, the operation goes through two stages:

# 1) wait for data preparation (waiting for the # 2) Copy the data from the kernel into the process (Copying the data from the kernel- the Process)

It is important to remember these two points, because the difference between these IO models is that there are different situations in both Phases.

Two-block io (blocking io):

In linux, all sockets are blocking by default, and a typical read operation flow is probably this:

When the user process invokes the Recvfrom system call, kernel begins the first phase of io: preparing the Data. For network io, There are times when the data has not arrived at the beginning (for example, a full UDP packet has not been received), and kernel waits for enough data to Arrive.

On this side of the user process, the entire process is Blocked. When kernel waits until the data is ready, it copies the data from the kernel to the User's memory, and then kernel returns the result, and the user process removes the block state and re-runs it.
therefore, The blocking IO is characterized by the block of two phases of Io execution (two stages of waiting for data and copying data).

Almost all programmers first come into contact with the network programming from Listen (), send (), recv () and other interfaces, the use of these interfaces can be very convenient to build a server/client model. however, most socket interfaces are blocking Types. Such as

Ps: the so-called blocking interface refers to a system call (typically an IO interface) that does not return the call result and keeps the current thread blocked until the system call gets the result or the timeout error occurs.

Virtually all IO interfaces (including the socket Interface) are blocking, unless specifically specified. This poses a big problem for network programming, such as when calling Recv (1024), the thread will be blocked, during which time the thread will be unable to perform any operations or respond to any network requests.

A simple solution:

#在服务器端使用多线程 (or Multiple processes). The purpose of multithreading (or Multi-process) is to have separate threads (or processes) for each connection, so that blocking of any one connection does not affect other connections.

The problem with this scenario Is:

#开启多进程或都线程的方式, when encountering a connection request that responds to hundreds or thousands of simultaneous requests, the system resources are heavily occupied by multiple threads or processes, reducing the System's responsiveness to the outside world, and the threads and processes themselves are more likely to go into suspended Animation.

Improvement Program:

#很多程序员可能会考虑使用 the thread pool or connection Pool. The thread pool is designed to reduce the frequency of creating and destroying threads, maintaining a reasonable number of threads, and allowing idle threads to re-assume new execution tasks. Connection pooling maintains a connected cache pool, reusing existing connections as much as possible, and reducing the frequency with which connections are created and Closed. Both of these technologies can reduce system overhead and are widely used in many large systems, such as websphere, tomcat, and various Databases.

There are problems with the post-improvement scheme:

# The thread pool and connection pooling technologies are only to some extent mitigated by the frequent invocation of the IO interface for resource Consumption. moreover, the so-called "pool" always has its upper limit, when the request greatly exceeds the upper limit, the "pool" composed of the system response to the outside world is not much better than when there is no Pool. So using the pool must consider the scale of the response it faces and adjust the size of the pool based on the response Scale.

The "thread pool" or "connection pool" may alleviate some of the stress, but not all of them, in response to the thousands or even thousands of client requests that may appear in the previous Example. In short, multithreaded models can easily and efficiently solve small-scale service requests, but in the face of large-scale service requests, multithreading model will encounter bottlenecks, you can use non-blocking interface to try to solve the Problem.

Three non-blocking io (non-blocking io):

Under linux, you can make it non-blocking by setting the Socket. When you perform a read operation on a non-blocking socket, the process looks like this:

As you can see, when the user process issues a read operation, if the data in kernel is not ready, it does not block the user process, but returns an error immediately. From the user process point of view, It initiates a read operation and does not need to wait, but immediately gets a result. When the user process determines that the result is an error, it knows that the data is not ready, so the user can do something else in the interval between this time and the next time the read query is initiated, or send the read operation directly again. Once the data in the kernel is ready and again receives the system call of the user process, it immediately copies the data to the User's memory (this phase is still blocked) and returns.

That is, after a non-blocking recvform system call, The process is not blocked, the kernel is returned to the process immediately, and if the data is not ready, an error is Returned. After the process returns, it can do something else before initiating the recvform system Call. Repeat the above process and iterate through the Recvform system Calls. This process is often called polling. Poll the kernel data until the data is ready, and then copy the data to the process for data processing. It is important to note that the process of copying data is still a blocking State.

therefore, in non-blocking io, the user process is actually required to constantly proactively ask kernel data ready to be Prepared.

#Service Side fromSocketImport*Import times=socket (af_inet,sock_stream) S.bind (('127.0.0.1', 8080)) S.listen (5) s.setblocking (False)#the interface for setting the socket is non-blockingConn_l=[]del_l=[] whileTrue:Try: Conn,addr=s.accept () conn_l.append (conn)exceptblockingioerror:Print(conn_l) forConninchConn_l:Try: Data=conn.recv (1024)                if  notdata:del_l.append (conn)Continueconn.send (data.upper ())exceptblockingioerror:Pass            exceptConnectionResetError:del_l.append (conn) forConninchdel_l:conn_l.remove (conn) conn.close () del_l=[]#Client fromSocketImport*C=socket (af_inet,sock_stream) C.connect (('127.0.0.1', 8080)) whiletrue:msg=input ('>>:')    if  notMsgContinuec.send (msg.encode ('Utf-8')) Data=c.recv (1024)    Print(data.decode ('Utf-8'))
NON-I/O Blocking Instances

But non-blocking IO models are never Recommended.

We cannot otherwise have the advantage of being able to do other things while waiting for the task to be completed (including submitting other tasks, that is, "backstage" can have multiple tasks at "" and "").

But it's also hard to hide its drawbacks:

#1. Cyclic invocation of recv () will significantly push up CPU occupancy, which is why we leave a sentence of Time.sleep (2) in the code, otherwise it is very easy to appear on the Low-cost host Machine. The response latency for task completion is increased because each time a read operation is polled, the task may be completed at any time between polling two Times. This can result in a decrease in overall data throughput.

In addition, in this scenario recv () is more of a test "operation is complete" role, the actual operating system provides more efficient detection "operation is completed" function of the interface, such as select () multiplexing mode, can detect more than one connection active at a Time.

Four multiplexed io (io multiplexing):

The word IO multiplexing may be a bit unfamiliar, but if I say select/epoll, i'll probably get it. Some places also call this IO mode for event-driven io(driven io). As we all know, the benefit of Select/epoll is that a single process can simultaneously handle multiple network connections of IO. The basic principle of the select/epoll is that the function will constantly poll all sockets that are responsible, and when a socket has data arrives, notifies the user of the Process. It's process

When the user process invokes select, the entire process is blocked, and at the same time, kernel "monitors" all select-responsible sockets, and when the data in any one socket is ready, select Returns. This time the user process then invokes the read operation, copying the data from the kernel to the user process.
This figure is not much different from the blocking IO diagram, in fact it's even Worse. Because there are two system calls (select and recvfrom) that need to be used, blocking IO only calls a system call (recvfrom). however, The advantage of using select is that it can handle multiple connection at the same time.

Emphasize:

1. If the number of connections processed is not high, Web server using Select/epoll does not necessarily perform better than the Web server using multi-threading + blocking io, and may be more delayed. The advantage of Select/epoll is not that a single connection can be processed faster, but that it can handle more connections.

2. In a multiplexed model, for each socket, it is generally set to non-blocking, but, as shown, the entire user's process is always block. Only the process is the block of the select function, not the socket IO.

conclusion: The advantage of select is that it can handle multiple connections, not for a single connection

#Service Side fromSocketImport*ImportSelects=socket (af_inet,sock_stream) s.setsockopt (sol_socket,so_reuseaddr,1) S.bind ('127.0.0.1', 8081)) S.listen (5) s.setblocking (False)#the interface for setting the socket is non-blockingRead_l=[s,] whiletrue:r_l,w_l,x_l=Select.select (read_l,[],[])Print(r_l) forReady_objinchR_l:ifReady_obj = =s:conn,addr=ready_obj.accept ()#at this point the ready_obj equals sread_l.append (conn)Else:            Try: Data=ready_obj.recv (1024)#at this point the ready_obj equals conn                if  notdata:ready_obj.close () read_l.remove (ready_obj)Continueready_obj.send (data.upper ())exceptConnectionResetError:ready_obj.close () read_l.remove (ready_obj)#Client fromSocketImport*C=socket (af_inet,sock_stream) C.connect (('127.0.0.1', 8081)) whiletrue:msg=input ('>>:')    if  notMsgContinuec.send (msg.encode ('Utf-8')) Data=c.recv (1024)    Print(data.decode ('Utf-8'))
Select network I/O model

Select monitors the process analysis of FD changes:

#用户进程创建socket对象, the copy monitoring of FD to the kernel space, each FD will correspond to a system file table, the kernel space of FD response to the data, will send a signal to the user process data has arrived; #用户进程再发送系统调用, For example (accept) copy the kernel space data to the user space, as well as the data to accept the core space of the data purged, so that the new monitoring of the FD and then again the data can be responded to (the sending side because the TCP protocol is based on the need to receive a reply before clearing).

advantages of this model:

#相比其他模型, the Event-driven model using select () executes only single-threaded (process), consumes less resources, consumes too much CPU, and provides services to multiple Clients. If you try to build a simple Event-driven server program, This model has some reference Value.

Disadvantages of the Model:

The #首先select () interface is not the best choice for implementing Event-driven. Because the select () interface itself consumes a lot of time to poll each handle when the value of the handle to be probed is large. Many operating systems provide a more efficient interface, such as Linux provides the EPOLL,BSD provides the Kqueue,solaris provides the/dev/poll, .... Interfaces like Epoll are recommended if you need to implement more efficient server Programs. unfortunately, the Epoll interface for different operating systems is a big difference, so using a epoll-like interface to implement a server with better cross-platform capabilities can be Difficult. #其次, This model is a combination of event detection and event response, which is catastrophic for the entire model once the event response is Large.

Python learning: IO model of concurrent programming

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.