Python Full Stack Development foundation "27th" IO Model

Source: Internet
Author: User
Tags connection pooling epoll set socket

Related noun analysis

Synchronization (synchronous): When a process executes a task, another process must wait for it to finish before it can continue execution

#所谓同步, that is, when a function call is made, the call does not return until the result is obtained. By this definition, the vast majority of functions are synchronous invocations. But generally speaking, we are talking about synchronous and asynchronous tasks, especially those that require other components to collaborate or need to be done in a certain amount of time. #举例: #1. Multiprocessing. Under the pool of apply #发起同步调用后, just waiting for the task at the end, do not consider whether the task is in the calculation or IO blocking, in short, a brain to the end of the task. Concurrent.futures.ProcessPoolExecutor (). Submit (func,). Result () #3. Concurrent.futures.ThreadPoolExecutor (). Submit (func,). Result ()

Asynchronous (asynchronous):

# asynchronous concepts and synchronization are relative. When an asynchronous function call is issued, the caller cannot get the result immediately.  notifies the caller by status, notification, or callback when the asynchronous function is complete. If the asynchronous function is notified by state, then the caller will need to check every time, the efficiency is very low (some beginners of multithreaded programming, always like to use a loop to check the value of a variable, which is actually a very serious error). If you are using notifications, the efficiency is high, because asynchronous functionality requires little extra action. As for the callback function, there is not much difference from the notification.  # Example:#1. Multiprocessing. Pool (). Apply_async () #发起异步调用后 does not wait for the end of the task to return, instead, it immediately gets a temporary result (not the final result, possibly an encapsulated object).  #2. Concurrent.futures.ProcessPoolExecutor (3). Submit (func,)#3. Concurrent.futures.ThreadPoolExecutor (3). Submit (func,)

Blocking (blocking):

#阻塞调用是指调用结果返回之前, the current thread is suspended (such as when an IO operation is encountered). The function will activate the blocked thread only after it has obtained the result. Someone might equate blocking calls with synchronous calls, and in fact he is different. For synchronous calls, many times the current thread is still active, but logically the current function does not return. #举例: #1. Synchronous invocation: Apply a task that accumulates 100 million times, and the call waits until the task returns the result but does not block (that is, it is in the ready state to be robbed of the CPU's execute permission); #2. Blocking calls: When the socket is working in blocking mode, if the recv function is called without data, the current thread will be suspended until there is data.

Non-blocking (non-blocking):

# The concept of non-blocking and blocking corresponds to a return immediately before the result is not immediately available, and the function does not block the current thread. 

Summary:

#2. Blocking and non-blocking is for a process or thread: Blocking is suspending a process when the request is not satisfied, not blocking does not block the current process

I. INTRODUCTION of IO Model

The objects and steps involved when IO occurs. For a network IO, it involves two system objects, one that calls the IO process (or thread), and the other is the system kernel (kernel). When a read operation occurs, the operation goes through two stages:

# 1) Wait for data preparation (waiting for the # 2) Copy the data from the kernel into the process (Copying the data from the kernel- the process)

Second, blocking IO (blocking IO)

Blocking io (blocking IO) features: The two phases of IO execution (waiting for data and copying data) are blocked.

Virtually all IO interfaces (including the socket interface) are blocking, unless specifically specified. This poses a big problem for network programming, such as when calling recv (1024), the thread will be blocked, during which time the thread will be unable to perform any operations or respond to any network requests.

#服务端from Socket Import *server = socket (af_inet,sock_stream) server.setsockopt (sol_socket,so_reuseaddr,1) server.bind ((' 127.0.0.1 ', 8080)) Server.listen (5) Print (' Start runnig ... ') while True:    conn,addr = server.accept ()  #IO操作 In this accept, you can't do recv's work.    print (addr) while    True:        try:            data = CONN.RECV (1024x768)  #IO操作            conn.send ( Data.upper ())        except Exception:            break    Conn.close () Server.close () # and this is what we wrote this is the blocking IO model: Once it's blocked, it's stuck there. Until the data has reached the operating system, the operating system is then copied from the kernel to the application # blocking IO is blocked in both phases.
#客户端from Socket Import *client  = socket (af_inet,sock_stream) client.connect ((' 127.0.0.1 ', 8080)) while True:    cmd = input (' >>: '). Strip ()    if not cmd:continue    client.send (Cmd.encode (' Utf-8 '))    data = CLIENT.RECV (1024x768)    print (' Accepted:%s '%data.decode (' Utf-8 ')) Client.close ()

  

A simple solution:

# use multithreading (or multiple processes) on the server side. The purpose of multithreading (or multi-process) is to have separate threads (or processes) for each connection, so that blocking of any one connection does not affect other connections. The problem with this scenario is that:# to open a multi-process or all-thread approach, when encountering a connection request that responds to hundreds or thousands of simultaneous requests, the system resources are heavily occupied by both multi-threaded and multi-process, reducing the system's responsiveness to the outside world. and threads and processes themselves are more likely to go into suspended animation. Improvement Scenario:    Many programmers might consider using the thread pool or connection pool. The thread pool is designed to reduce the frequency of creating and destroying threads, maintaining a reasonable number of threads, and allowing idle threads to re-assume new execution tasks. Connection pooling maintains a connected cache pool, reusing existing connections as much as possible, and reducing the frequency with which connections are created and closed. Both of these technologies can reduce system overhead and are widely used in many large systems, such as WebSphere, Tomcat, and various databases.  there are problems with the post-improvement scheme: # The "thread pool" and "connection pooling" techniques are only to some extent mitigated by the frequent invocation of the IO interface for resource consumption. Moreover, the so-called "pool" always has its upper limit, when the request greatly exceeds the upper limit, the "pool" composed of the system response to the outside world is not much better than when there is no pool. So using the pool must consider the scale of the response it faces and adjust the size of the pool based on the response scale. 

The "thread pool" or "Connection pool" may alleviate some of the stress, but not all of them, in response to the thousands or even thousands of client requests that may appear in the previous example. In short, multithreaded models can easily and efficiently solve small-scale service requests, but in the face of large-scale service requests, multithreading model will encounter bottlenecks, you can use non-blocking interface to try to solve the problem.

Three, non-blocking IO (nonblocking io)

Multi-threaded, multi-process, process pool, thread pool can be implemented concurrently, but still not solve IO problem
So let's take a look at the non-blocking IO

As you can see, when the user process issues a read operation, if the data in kernel is not ready, it does not block the user process, but returns an error immediately. From the user process point of view, it initiates a read operation and does not need to wait, but immediately gets a result. When the user process determines that the result is an error, it knows that the data is not ready, so the user can do something else in the interval between this time and the next time the read query is initiated, or send the read operation directly again. Once the data in the kernel is ready and again receives the system call of the user process, it immediately copies the data to the user's memory (this phase is still blocked) and returns.

That is, after a non-blocking recvform system call, the process is not blocked, the kernel is returned to the process immediately, and if the data is not ready, an error is returned. After the process returns, it can do something else before initiating the recvform system call. Repeat the above process and iterate through the recvform system calls. This process is often called polling. Poll the kernel data until the data is ready, and then copy the data to the process for data processing. It is important to note that the process of copying data is still a blocking state.

Therefore, in non-blocking IO, the user process is actually required to constantly proactively ask kernel data ready to be prepared.

Server.setblocking ()# default is True  #  So, in non-blocking IO, The user process is actually the need to constantly proactively ask the kernel data ready to be prepared. Wait data and so on this stage is non-blocking copy data this phase or to block

Service side

 #这种程序虽说解决了单线程并发, but greatly occupied Cpufrom socket import *import Timesevert = socket (af_ Inet,sock_stream) severt.setsockopt (sol_socket,so_reuseaddr,1) severt.bind ((' 127.0.0.1 ', 8080)) Severt.listen (5) Severt.setblocking (False) #默认是True (if False, some of the blocking operations in the socket become non-blocking) print (' startting ... ') conn_l = []del_l =[]while True : Try:print (conn_l) conn,addr = severt.accept () #收不到数据的时候才出异常 print (conn) Conn_l.append (CO nn) except Blockingioerror: #吧收不到数据的那段时间利用起来 (using him to receive #数据的时候, talent below for Loop) for Conn in Conn_l:try                : Data = CONN.RECV (1024x768) conn.send (Data.upper ()) except Blockingioerror: Pass except Connectionreseterror: #端开链接的错误 (If you break the link abruptly, an error will be #就先添加 Go to the list. The link is cleared) del_l.append (conn) for obj in Del_l:obj.close () conn_l.re Move (obj) del_l.clear () 

Client

From socket Import *client = socket (af_inet,sock_stream) client.connect ((' 127.0.0.1 ', 8080)) while True:    cmd = input ( ' >>: '). Strip ()    if not cmd:continue    client.send (Cmd.encode (' Utf-8 '))    data = CLIENT.RECV (1024)    Print (Data.decode (' Utf-8 '))

Note to the server: Connectionreseterror occurs if the client disconnects

So let's deal with this exception. As shown on the service side above

But non-blocking IO models are never recommended.

The non-blocking IO model has the advantage of being able to do other work while waiting for the task to complete (including submitting other tasks, that is, "backstage" can have multiple tasks at "" and "").

Non-blocking IO Model disadvantages:

  1. Cyclic call recv () will significantly push up the CPU occupancy rate; This is why we leave a sentence of Time.sleep (2) in the code, otherwise it is very easy to appear in the low-match host machine condition

2. The Response latency for task completion is increased, because each time a read operation is polled, the task may be completed at any time between two polls. This can result in a decrease in overall data throughput.

Quad-multiplexing IO (IO multiplexing)

When the user process invokes select, the entire process is blocked, and at the same time, kernel "monitors" all select-responsible sockets, and when the data in any one socket is ready, select returns. This time the user process then invokes the read operation, copying the data from the kernel to the user process.
This figure is not much different from the blocking IO diagram, in fact it's even worse. Because there are two system calls (select and recvfrom) that need to be used, blocking IO only calls a system call (Recvfrom). However, the advantage of using select is that it can handle multiple connection at the same time.

Emphasize:

1. If the number of connections processed is not high, Web server using Select/epoll does not necessarily perform better than the Web server using multi-threading + blocking IO, and may be more delayed. The advantage of Select/epoll is not that a single connection can be processed faster, but that it can handle more connections.

2. In a multiplexed model, for each socket, it is generally set to non-blocking, but, as shown, the entire user's process is always block. Only the process is the block of the Select function, not the socket IO.

Conclusion: The advantage of select is that it can handle multiple connections, not for a single connection

 #select IO Model # service-side from socket import *import selects=socket (af_inet,sock_ STREAM) s.setsockopt (sol_socket,so_reuseaddr,1) s.bind ((' 127.0.0.1 ', 8081)) S.listen (5) s.setblocking (False) #        The interface for setting the socket is non-blocking read_l=[s,]while true:r_l,w_l,x_l=select.select (read_l,[],[]) print (r_l) for ready_obj in r_l: if ready_obj = = s:conn,addr=ready_obj.accept () #此时的ready_obj等于s read_l.append (conn) Els                    E:TRY:DATA=READY_OBJ.RECV (1024x768) #此时的ready_obj等于conn if not data: Read_l.remove (Ready_obj) Continue Ready_obj.send (Data.upper ()) except Co NnectionResetError:read_l.remove (ready_obj) #客户端from socket import *c=socket (Af_inet,sock_stream) C.connect ( (' 127.0.0.1 ', 8081)) While True:msg=input (' >>: ') If not msg:continue c.send (Msg.encode (' Utf-8 ')) Data=c.recv (1024x768) Print (Data.decode (' Utf-8 ')) 
#服务端 (multiplexed io) # Select module detects the socket with the Select method is ready, that is, to collect the data (and our # non-blocking IO you do not know that the socket is ready, then use the Select module to solve the problem #) # Select can also detect multiple sockets # So the Select is more efficient than non-blocking IO from socket import *import selectserver = socket (af_inet,sock_stream) Server.setsockopt (sol_socket,so_reuseaddr,1) server.bind ((' 127.0.0.1 ', 8081)) server.setblocking (False) # Set socket sockets as Nonblocking Server.listen (5) Print (' Start running ... ') read_l = [Server,] #因为不只就那么一个列表要检测. So don't die in the argument. While true:r_l,w_l,x_l = Select.select (read_l,[],[]) #select () method has four parameters print (r_l) #一开始服务端运行的时候, just wait, when your guest When the user has a link, he detects the data (detects that data is ready) for obj in r_l:if obj = = Server:conn,addr = Obj.ac Cept () #accept要经历两个阶段, but if the program goes this far, it must be the data is ready #当数据已经准备好的时候, the accept only goes through the stage of the copy data # PRI NT (addr) read_l.append (conn) #在监听一下conn套接字 (this time has been monitored by two: accept,conn respectively) Else:data = Obj.recv (1024) # At this time the Obj=conn obj.send (Data.upper ()) # obj.close () # server.close ()
#客户端 (multiplexed IO) from socket import *import selectclient = socket (af_inet,sock_stream) client.connect ((' 127.0.0.1 ', 8081)) While True:    cmd = input (' >>: ')    client.send (Cmd.encode (' Utf-8 '))    data = Client.recv (1024x768)    Print (' Received:%s '%data.decode (' Utf-8 ')) Client.close ()

Select monitors the process analysis of FD changes:

# The user process creates a socket object, copies the monitored FD to the kernel space, and each FD corresponds to a System file table. after the FD in the kernel space responds to the data, it sends a signal to the user that the process data has arrived; # The user process then sends a system call, such as (accept) to copy the kernel space data to the user space, at the same time as the data to accept the core space of the data cleanup, so that the re-monitoring when the FD has new data and can respond to (the sending side because the TCP protocol is based on the need to receive an answer before clearing). 

Advantages of the Select module # compared to other models, the event-driven model using select () executes only one single thread (process), consumes less resources, consumes too much CPU, It also provides services to multiple clients. If you try to build a simple event-driven server program, this model has some reference value. The disadvantage of the Select module # The first select () interface is not the best choice for implementing "event driven". Because theSelect () interface itself consumes a lot of time to poll each handle when the value of the handle to be probed is large. Many operating systems provide a more efficient interface, such as Linux provides the Epoll,bsd provided by Kqueue,solaris/dev/poll, .... Interfaces like Epoll are recommended if you need to implement more efficient server programs. Unfortunately, the Epoll interface for different operating systems is a big difference, so using a epoll-like interface to implement a server with better cross-platform capabilities can be difficult.  # Second, the model is a mixture of event detection and incident response, which is catastrophic for the entire model once the event response is large. 

V, asynchronous IO (asynchronous IO)

After the user process initiates the read operation, you can begin to do other things immediately. On the other hand, from the perspective of kernel, when it receives a asynchronous read, first it returns immediately, so no block is generated for the user process. Then, kernel waits for the data to be ready and then copies the data to the user's memory, and when all this is done, kernel sends a signal to the user process to tell it that the read operation is complete.

Six, IO Model comparison analysis

As described above, the difference between non-blocking io and asynchronous io is obvious. In non-blocking io, although the process will not be blocked for most of the time, it still requires the process to go to the active check, and when the data is ready, it is also necessary for the process to proactively call Recvfrom to copy the data to the user's memory. and asynchronous Io is completely different. It's like a user process handing over an entire IO operation to someone else (kernel) and then sending a signal notification when someone finishes it. During this time, the user process does not need to check the status of the IO operation, nor does it need to actively copy the data.

Seven, selsectors module

These three IO multiplexing models have different support on different platforms, and poll and Epoll are not supported under Windows, but we have the selectors module, which helps us to choose the most suitable one under the current platform by default.

# Selectors
# Server from socket import *import selectorssel=selectors. Defaultselector () def accept (Server_fileobj,mask): Conn,addr=server_fileobj.accept () Sel.register (conn,selectors. Event_read,read) def READ (conn,mask): Try:data=conn.recv (1024x768) if not data:print (' closing ', CO NN) sel.unregister (conn) conn.close () return Conn.send (Data.upper () +b ' _SB ') exc EPT exception:print (' closing ', conn) sel.unregister (conn) conn.close () Server_fileobj=socket (Af_inet, SOCK_STREAM) server_fileobj.setsockopt (sol_socket,so_reuseaddr,1) server_fileobj.bind ((' 127.0.0.1 ', 8088)) Server_ Fileobj.listen (5) server_fileobj.setblocking (False) #设置socket的接口为非阻塞sel. Register (server_fileobj,selectors. event_read,accept) #相当于网select的读列表里append了一个文件句柄server_fileobj, and binds a callback function Acceptwhile True:events=sel.select () # Detects all fileobj, if there is a for sel_obj,mask in Events:callback=sel_obj.data that completes wait data #callback =accpet callback ( Sel_obj.Fileobj,mask) #accpet (server_fileobj,1) #客户端from socket import *c=socket (af_inet,sock_stream) c.connect (' 127.0.0.1 ' , 8088)) while True:msg=input (' >>: ') If not msg:continue c.send (Msg.encode (' Utf-8 ')) DATA=C.RECV (1024) Print (Data.decode (' Utf-8 '))

Viii. Summary

Io multiplexing (SELECT)
Select detects which socket is ready (waits when detected, becomes blocked)


Select is better than blocking IO, because select can detect multiple sockets
Multiple links under Select to play its advantage
But you have a lot of sockets, how do you know which one is good, then you have to use loops to traverse
So if it's a lot of time, it's not going to be efficient.

Eppol: Only Linux systems are supported (to solve the problem of low select efficiency)
Eppol efficiency is higher than pool,select


Selectors better use, solve the above select,eppol,pool problem

Socketserver with this module IO problem also solved, the implementation of concurrency also solved

Python Full Stack Development foundation "27th" IO Model

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.