IO model of Python concurrent programming

Last Update:2018-05-02 Source: Internet

Author: User

Tags connection pooling epoll

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Let's start by talking about the objects and steps involved in IO. Take read, for example, to go through two stages:

1) Wait for data preparation

2) copy data from the kernel to the process

Two, blocking IO (blocking IO)

In Linux, all sockets are blocking by default, and a typical read operation flow is as follows:

So the blocking Io is characterized by block (blocking) of the two phases of IO execution (waiting for data and copying data).

Almost all programmers first come into contact with the network programming from Listen (), send (), recv () and other interfaces, the use of these interfaces can be very convenient to build a server/client model. However, most socket interfaces are blocking types. Such as

PS: The so-called blocking interface refers to a system call (typically an IO interface) that does not return the call result and keeps the current thread blocked until the system call gets the result or the timeout error occurs.

So there's a simple solution:

#在服务器端使用多线程 (or multiple processes). The purpose of multithreading (or multi-process) is to have separate threads (or processes) for each connection, so that blocking of any one connection does not affect other connections.

The problem with this scenario is:

#开启多进程或多线程的方式, when encountering a connection request that responds to hundreds or thousands of simultaneous requests, the system resources are heavily occupied by multiple threads or processes, reducing the system's responsiveness to the outside world, and the threads and processes themselves are more likely to go into suspended animation.

Improvement Program:

#很多程序员可能会考虑使用 the thread pool or connection pool. The thread pool is designed to reduce the frequency of creating and destroying threads, maintaining a reasonable number of threads, and allowing idle threads to re-assume new execution tasks. Connection pooling maintains a connected cache pool, reusing existing connections as much as possible, and reducing the frequency with which connections are created and closed. Both of these technologies can reduce system overhead and are widely used in many large systems, such as WebSphere, Tomcat, and various databases.

There are also problems with the improvement scheme:

# The thread pool and connection pooling technologies are only to some extent mitigated by the frequent invocation of the IO interface for resource consumption. Moreover, the so-called "pool" always has its upper limit, when the request greatly exceeds the upper limit, the "pool" composed of the system response to the outside world is not much better than when there is no pool. So using the pool must consider the scale of the response it faces and adjust the size of the pool based on the response scale.

The "thread pool" or "Connection pool" may alleviate some of the stress, but not all of them, in response to the thousands or even thousands of client requests that may appear in the previous example. In short, multithreaded models can easily and efficiently solve small-scale service requests, but in the face of large-scale service requests, multithreading model will encounter bottlenecks, you can use non-blocking interface to try to solve the problem.

Three non-blocking IO (non-blocking io)

Under Linux, you can change the socket period to non-blocking. When performing a read operation on a non-blocking socket, the process is as follows;

As you can see, when the user process issues a read operation, if the data in kernel is not ready, it does not block the user process, but returns an error immediately. From the user process point of view, it initiates a read operation and does not need to wait, but immediately gets a result. When the user process determines that the result is an error, it knows that the data is not ready, so the user can do something else in the interval between this time and the next time the read query is initiated, or send the read operation directly again. Once the data in the kernel is ready and again receives the system call of the user process, it immediately copies the data to the user's memory (this phase is still blocked) and returns.

That is, after a non-blocking recvform system call, the process is not blocked, the kernel is returned to the process immediately, and if the data is not ready, an error is returned. After the process returns, it can do something else before initiating the recvform system call. Repeat the above process and iterate through the recvform system calls. This process is often called polling. Poll the kernel data until the data is ready, and then copy the data to the process for data processing. It is important to note that the process of copying data is still a blocking state.

Therefore, in non-blocking IO, the user process is actually required to constantly proactively ask kernel data ready to be prepared.

#服务端 fromSocket Import *Import times=socket () S.bind (('127.0.0.1',8080)) S.listen (5) s.setblocking (False) r_list=[]w_list=[] whileTrue:Try: Conn,addr=s.accept () r_list.append (conn) except Blockingioerror:print ('you can do other work.') Print ('rlist:', Len (r_list)) Del_rlist=[]         forConninchr_list:Try: Data=CONN.RECV (1024x768)                ifNot Data:conn.close () del_rlist.append (conn)ContinueW_list.append ((Conn,data.upper ())) except Blockingioerror:Continueexcept ConnectionResetError:conn.close () del_rlist.append (conn) DEL_ Wlist=[]         forIteminchw_list:Try: Conn=item[0] Res=item[1] Conn.send (res) del_wlist.append (item) except Blockingioerror: Continueexcept ConnectionResetError:conn.close () Del_wlist.append (item)  forConninchDel_rlist:r_list.remove (conn) forIteminchdel_wlist:w_list.remove (item) #客户端 fromSocket Import *Import Osclient=socket () Client.connect (('127.0.0.1',8080)) whileTrue:data='%s Say hello'%os.getpid () client.send (Data.encode ('Utf-8')) Res=CLIENT.RECV (1024x768) Print (Res.decode ('Utf-8'))

But non-blocking IO models are never recommended.

We cannot deny the advantages of being able to do other things while waiting for the task to complete (including submitting other tasks, that is, ' backstage ' can have multiple tasks at ' simultaneous ' execution)

But it's also hard to hide its drawbacks:

1: Circular call recv () will significantly refer to CPU occupancy, which is also our code to leave a sentence time.sleep (2) de reason, otherwise in the low-distribution host is very easy to appear card machine very easy to appear card machine situation.  2:. The response latency for task completion is increased because each time a read operation is polled, the task may complete at any time between two polls, which results in a decrease in overall data throughput.

Multiplexed io (iomultiplexing)

Io multiplexing is also called Select/epoll, and his benefit is that in a single process it is possible to simultaneously process IO for multiple network connections.

The basic principle is select/epoll. This function will continually poll for all sockets, notifying the user of the process when a socket has data to arrive. The process of special

When the user process calls the Select,name the entire process is blocked, while select detects all of the sockets it is responsible for, and when the data in any one socket is ready, select returns. This time the user process invokes the read operation, copying the data from kernel to the user process.

Emphasize:

1. If the number of connections processed is not high, Web server using Selec/epoll does not necessarily perform better than the Web server using Multi_threading+blocking IO, possibly

is even bigger. The advantage of Select/epool is not that it can be processed faster for a single connection, but rather that it can handle more connections.

2, in the multiplex model, for each socket, is generally set to become non-blocking, but if the entire user's process is in fact consistent block, the value of the process is a Select this function block, and not is the socket IO to block.

Conclusion:

The advantage of select is that it can handle multiple connections, not with a single connection

#服务端 fromSocket Import *ImportSelects=Sockets.bind (('127.0.0.1',8080)) S.listen (5) s.setblocking (False) r_list=[S,]w_list=[]w_data={} whileTrue:print ('detected r_list:', Len (r_list)) print ('detected w_list:', Len (w_list)) RL,WL,XL=Select.Select(r_list,w_list,[],) forRinchRL:ifr==s:conn,addr=r.accept () r_list.append (conn)Else:            Try: Data= R.recv (1024x768)                ifNot Data:r.close () r_list.remove (R)ContinueW_list.append (R) w_data[r]=data.upper () except Connectionreseterror:r.close () R_list.remove (R) Continue     forWinchwl:w.send (W_data[w]) w_list.remove (W) w_data.pop (w) #客户端 fromSocket Import *Import Os,client=socket () Client.connect (('127.0.0.1',8080)) whileTrue:data="%s Say hello"%os.getpid () client.send (Data.encode ('Utf-8')) Res=CLIENT.RECV (1024x768) Print (Res.deccode ('Utf-8'))

Advantages of the Model:

Compared to other models, the event-driven model using select () is executed with a single thread (process), consumes less resources, consumes too much CPU, and can serve multiple clients, and this model has some reference value if the view establishes a simple event-driven server program.

Disadvantages of the model

The first select () interface is not the best choice for "event-driven" in advance. Because the Select () interface itself consumes a lot of time to poll each handle when the value of the handle to be probed is large. Many operating systems provide a more efficient interface, such as LINUXT provides Epoll. Wait a minute. If you need to implement more efficient server programs, such as Epoll interface is recommended, unfortunately, different operating systems provided by the Epoll interface is a big difference. Therefore, it is difficult to use a epoll-like interface to implement a server with better cross-platform capabilities.

Secondly, the model is a combination of event detection and event response, which is catastrophic for the entire model once the event response is large.

Asynchronous IO (asynchronous I/O)

Linux asynchronous IO is actually not used much, starting from the kernel 2.6 version of the introduction, first look at his process:

After the user process initiates the read operation, you can begin to do other things immediately. On the other hand, from the perspective of kernel, when it receives a asynchronous read, first it returns immediately, so no block is generated for the user process. Then, kernel waits for the data to be ready and then copies the data to the user's memory, and when all this is done, kernel sends a signal to the user process to tell it that the read operation is complete.

IO model of Python concurrent programming

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More