Python basic-I/O model and python-io Model
I. I/O model
IO indicates Input/Output in the computer, that is, Input and Output. Because the program and runtime data reside in the memory, it is executed by the super-fast computing core of the CPU, which involves data exchange, usually disks and networks, and IO interfaces are required.
What are the differences between synchronous I/O and asynchronous I/O, blocking I/O, and non-blocking I/O?
Different people may give different answers to this question. For example, wiki considers asynchronous IO and non-blocking IO as one thing. This is because different people have different knowledge backgrounds and the context is different when discussing this issue. Therefore, to better answer this question, we should first limit the context of this article.
The background of this article is network IO in Linux.
In this article, Steven s compares five IO models:
- Blocking IO (blocking IO)
- Nonblocking IO (non-blocking IO)
- IO multiplexing (IO multiplexing)
- Asynchronous IO (asynchronous IO)
- Signal driven IO (signal-driven IO)
Because signal driven IO is not commonly used in practice, I will only mention the remaining four IO models.
Let's talk about the objects and steps involved when I/O occurs.
For a network IO (Here we use read as an example), it involves two system objects, one is to call the IO process (or thread), and the other is the system kernel (kernel ). When a read operation occurs, it goes through two phases:
It is important to remember these two points, because the differences between these IO models are different in the two phases.
Ii. blocking IO
In linux, all sockets are blocking by default. A typical read operation process is like this:
When the user process calls the recvfrom system call, the kernel starts the first stage of IO: Prepare data. For network IO, data has not arrived at the beginning (for example, a complete UDP packet has not yet been received). At this time, the kernel will wait for enough data to arrive. On the user process side, the whole process will be blocked. When the kernel waits until the data is ready, it will copy the data from the kernel to the user memory, and then the kernel returns the result, the user process will unblock the status and run it again.
Therefore, the feature of blocking IO is that it is blocked in both stages of IO execution.
Iii. non-blocking IO
In linux, you can set socket to non-blocking. When a read operation is performed on a non-blocking socket, the process looks like this:
It can be seen that when a user process sends a read operation, if the data in the kernel is not ready, it does not block the user process, but immediately returns an error. From the perspective of the user process, after initiating a read operation, it does not need to wait, but immediately gets a result. When the user process determines that the result is an error, it knows that the data is not ready, so it can send the read operation again. Once the data in the kernel is ready and the system call of the user process is received again, it immediately copies the data to the user memory and returns it. Therefore, the user process needs to actively ask about the kernel data.
Note:
During network I/O, non-blocking I/O will also call the recvform system to check whether the data is ready, which is different from blocking I/O, "non-blocking divides the congestion of the whole piece of time into N-plus small blocking, so the process constantly has the opportunity to 'be patronized by the CPU ". That is to say, the cpu permission is still in the hands of the process between every recvform system call. During this time, other tasks can be done. That is to say, after a non-blocking recvform system call, the process is not blocked, the kernel returns to the process immediately. If the data is not ready, an error is returned. After the process returns, it can do something else and then initiate a recvform system call. Recvform system calls are carried out cyclically. This process is usually called round robin. Round Robin checks kernel data until the data is ready, and then copies the data to the process for data processing. Note that the process of copying data is still blocked.
1 import time 2 import socket 3 4 sk = socket. socket (socket. AF_INET, socket. SOCK_STREAM) 5 sk. bind ('2017. 0.0.1 ', 8080) 6 sk. listen (5) 7 sk. setblocking (False) # Set the socket to non-blocking mode 8 while True: 9 try: 10 print ('waiting client connection ....... ') 11 connection, address = sk. accept () # The process actively polls 12 print ("++", address) 13 client_messge = connection. recv (1024) 14 print (str (client_messge, 'utf8') 15 connection. close () 16 failed t Exception as e: 17 print (e) 18 time. sleep (4) 19 20 ############################ client21 22 import time23 import socket24 sk = socket. socket (socket. AF_INET, socket. SOCK_STREAM) 25 26 while True: 27 sk. connect ('2017. 0.0.1 ', 8080) 28 print ("hello") 29 sk. sendall (bytes ("hello", "utf8") 30 time. sleep (2) 31 break
Instance
For the above instance, the Service segment side polls every 4 seconds. If there is no client link, an error message is thrown and the polling continues.
Non-blocking IO:
Iv. IO multiplexing
IO multiplexing may be a bit unfamiliar. In some places, this IO method is also called event driven IO. We all know that the benefit of select/epoll is that a single process can process the IO of multiple network connections at the same time. The basic principle of this function is that the select/epoll function will continuously poll all the sockets in charge. When a socket has data, it will notify the user process. Its Process
When a user process calls the select statement, the entire process will be blocked. At the same time, the kernel will "Monitor" All sockets under the select statement. When the data in any socket is ready, select returns. At this time, the user process then calls the read operation to copy data from the kernel to the user process.
This graph is not much different from the blocking IO graph. In fact, it is worse. Because two system calls (select and recvfrom) need to be used here, while blocking IO only calls one system call (recvfrom ). However, the advantage of using select is that it can process multiple connections at the same time. (More. Therefore, if the number of connections to be processed is not very high, the web server using select/epoll may not have better performance than the web server using multi-threading + blocking IO, and may have a greater latency. The advantage of select/epoll is not that it can process a single connection faster, but that it can process more connections)
In the I/O multiplexing Model, in practice, each socket is generally set to non-blocking. However, as shown in, the entire user's process is always blocked. However, process is block by the select function, rather than block by socket IO.
Note:
1 import select, socket 2 3 sock = socket.socket() 4 sock.bind(('127.0.0.1', 8080)) 5 sock.listen(5) 6 7 sock.setblocking(False) 8 listen_obj = [sock, ] 9 10 while True:11 r, w, e = select.select(listen_obj, [], [])12 13 for obj in r:14 if obj == sock:15 conn, addr = obj.accept()16 print('conn', conn)17 print('addr', addr)18 listen_obj.append(conn)19 else:20 data = obj.recv(1024)21 print(data.decode('utf8'))22 send_data = input('>>>')23 obj.send(send_data.encode('utf8'))24 25 #############################client26 27 28 import socket29 30 sock = socket.socket()31 sock.connect(('127.0.0.1', 8080))32 33 34 while True:35 data = input('>>>')36 sock.send(data.encode('utf8'))37 recv_data = sock.recv(1024)38 print(recv_data.decode('utf8'))39 sock.close()
Instance
In the preceding example, the server kernel listens to all socket objects in listen_obj under select. When any socket object is activated, determine whether to establish communication based on its type.
V. asynchronous I/O
In linux, asynchronous IO is rarely used. Let's take a look at its process:
After the user process initiates the read operation, it can immediately start to do other things. On the other hand, from the perspective of kernel, when it receives an asynchronous read, it will first return immediately, so it will not generate any block to the user process. Then, the kernel will wait for the data preparation to complete and then copy the data to the user memory. After all this is done, the kernel will send a signal to the user process to tell it that the read operation is complete.
Obviously, the performance of program writing using asynchronous IO is much higher than that of synchronous IO, but the disadvantage of asynchronous IO is that the programming model is complicated.
Vi. IO model comparison
So far, four I/O models have been introduced. Now let's look back at the first few questions: What is the difference between blocking and non-blocking? What is the difference between synchronous IO and asynchronous IO.
Answer the simplest one: blocking vs non-blocking. The difference between the two is clearly described in the previous introduction. Calling blocking IO will block the corresponding process until the operation is completed, while non-blocking IO will return immediately when the kernel still prepares data.
Before describing the differences between synchronous IO and asynchronous IO, you must first define the two. The definitions provided by Steven S (actually the POSIX definition) are like this:
- A synchronous I/O operation causes the requesting process to be blocked until that I/O operation completes;
- An asynchronous I/O operation does not cause the requesting process to be blocked;
The difference between the two is that synchronous IO blocks process when performing "IO operation. According to this definition, the previously described blocking IO, non-blocking IO, and IO multiplexing all belong to synchronous IO. Some may say that non-blocking IO is not blocked. Here is a very "Tricky" place. The "IO operation" in the definition refers to the real IO operation, that is, the recvfrom system call in the example. When non-blocking IO executes the recvfrom system call, if the kernel data is not ready, the process will not be blocked. However, when the data in the kernel is ready, recvfrom will copy the data from the kernel to the user memory. At this time, the process is blocked. During this time, the process is blocked. Asynchronous IO is different. When a process initiates an I/O operation, it directly returns the result and ignores it again until the kernel sends a signal telling the process that I/O is complete. In this process, the process is not blocked at all.
Comparison of IO models:
After the above introduction, we will find that the difference between non-blocking IO and asynchronous IO is quite obvious. In non-blocking IO, although the process is not blocked for most of the time, it still requires the process to take the initiative to check, and after the data preparation is complete, the process also needs to actively call recvfrom again to copy data to the user memory. Asynchronous IO is completely different. It is like a user process handing over the entire IO operation to another person (kernel) to complete, and then the other person will send a signal after completion. During this period, the user process does not need to check the I/O operation status or actively copy data.
VII. selectors Module
This module allows high-level and efficient I/O multiplexing, built uponselect
Module primitives. Users are encouraged to use this module instead, unless they want precise control over the OS-level primitives used.
Selectors is an encapsulation of select and can efficiently implement I/O multiplexing. It is recommended!
1 import selectors 2 import socket 3 4 sock = socket. socket () 5 sock. bind ('2017. 0.0.1 ', 8080) 6 sock. listen (5) 7 sel = selectors. defaultSelector () # select the optimal IO multiplexing mechanism based on the specific platform; linux: epoll (epoll | kqueue | devpoll> poll> select) 8 9 10 def read (conn, mask ): 11 try: # terminate the client, capture exceptions, and remove 12 data = conn from the listener list. recv (1024) 13 print (data. decode ('utf8') 14 re_data = input ('>>>') 15 conn. send (re_data.encode ('utf8') 16 bytes t Exception: 17 sel. unregister (conn) # revoke event registration 18 19 20 def accept (sock, mask): 21 conn, addr = sock. accept () 22 sel. register (conn, selectors. EVENT_READ, read) # register the event. If conn is triggered, run the read function 23 24 sel. register (sock, selectors. EVENT_READ, accept) # register the event. If sock is triggered, execute the accept function 25 26 while True: 27 print ('wating... ') 28 events = sel. select () # Listen to 29 for key, mask in events: 30 func = key. data # contains the accept and read functions 31 obj = key. fileobj # contains sock and conn32 33 func (obj, mask) # accept (sock, mask); read (conn, mask)
Server
1 import socket 2 3 sock = socket.socket() 4 sock.connect(('127.0.0.1', 8080)) 5 6 7 while True: 8 data = input('>>>') 9 sock.send(data.encode('utf8'))10 recv_data = sock.recv(1024)11 print(recv_data.decode('utf8'))12 sock.close()
Client
References:
1. http://www.cnblogs.com/yuanchenqi/articles/6755717.html#3687669
2. http://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/001431917590955542f9ac5f5c1479faf787ff2b028ab47000