Issues in the previous section:
Co-process: switch on IO operation.
But when do you cut it back? How do I know I'm done with IO?
first, the event-driven model introduction
Typically, when we write a program that server processes a model, there are several models:
(1) Each time a request is received, a new process is created to process the request;
(2) Each time a request is received, a new thread is created to process the request;
(3) Each request is received, put into an event list, let the main process through non-blocking I/O way to process the request
The third is the co-process, time-driven approach, it is generally accepted that the first (3) mode is the way most Web servers are used
on the event-driven model
In UI programming, it is often necessary to click on the mouse to the corresponding, first how to get a mouse click?
Mode one: Create a thread that has been looping through the detection of mouse clicks , and there are several drawbacks to this approach:
- CPU resources wasted, may be mouse click frequency is very small, but the scan thread will still be cyclic detection, which will cause a lot of CPU waste, if the scan mouse click on the interface is blocked?
- If it is blocked, and will appear the following problems, if we not only to scan the mouse click, but also to scan the keyboard is pressed, because scanning the mouse is blocked, then may never go to scan the keyboard;
- If a loop needs to scan a lot of devices, this will lead to response time problems;
So, the way is very bad.
mode Two: is the event-driven model
Most of the current UI programming is an event-driven model, as many UI platforms provide the OnClick () event, which represents the mouse down event. The event-driven model is broadly thought of as follows:
- There is an event (message) queue;
- When the mouse is pressed, add a click event (message) to this queue;
- There is a loop that constantly takes events out of the queue, and calls different functions, such as onclick (), OnKeyDown (), according to different events;
- Events (messages) typically hold their own handler pointers, so that each message has its own handler function;
Event-driven programming is a programming paradigm where the execution flow of a program is determined by an external event. It is characterized by the inclusion of an event loop that uses a callback mechanism to trigger the corresponding processing when an external event occurs . Two other common programming paradigms are (single-threaded) synchronization and multithreaded programming.
Let's use examples to compare and contrast single-threaded, multithreaded, and event-driven programming models. Shows the work done by the programs in these three modes over time. This program has 3 tasks to complete, each of which blocks itself while waiting for an I/O operation. The time it takes to block the I/O operation has been marked with a gray box.
The initial problem: How to determine the IO operation is finished cutting back? through the callback function
second, select\poll\epoll asynchronous Ioio = multiplexing
The front is the use of the implementation of the IO blocking automatic switching, then how to implement the association, the principle is how to achieve. How to achieve the event-driven situation of IO automatic blocking switch, what is the name of this study? = = IO multiplexing
For example, Socketserver, multiple client connections, single-threaded implementation of the concurrency effect, called multiplexing.
What is the difference between synchronous IO and asynchronous Io, what is blocking IO and non-blocking IO respectively? The answers given by different people in different contexts are different. So first limit the context of this article.
This article discusses the background of network IO in a Linux environment.
1, blocking IO, non-blocking IO, synchronous IO, asynchronous IO Introduction
Before interpreting, there are a few concepts to be explained:
- User space and kernel space
- Process switching
- Blocking of processes
- File descriptor
- Cache I/O
user space and kernel space
Now that the operating system is using virtual memory, the 32-bit operating system, its addressing space (virtual storage space) is 4G (2 of 32).
The core of the operating system is the kernel, which is independent of the normal application, has access to protected memory space, and has all the permissions to access the underlying hardware device.
In order to ensure that the user process can not directly manipulate the kernel (kernel), to ensure the security of the kernel, worry about the system to divide the virtual space into two parts, part of the kernel space, part of the user space.
For the Linux operating system, the highest 1G bytes (from the virtual address 0xc0000000 to 0xFFFFFFFF) for the kernel to use, called the kernel space, and the lower 3G bytes (from the virtual address 0x00000000 to 0xBFFFFFFF) for each process to use, Called User space.
Process Switching
To control the execution of a process, the kernel must have the ability to suspend a process that is running on the CPU and resume execution of a previously suspended process. This behavior is called process switching. So it can be said that any process that runs under the support of the operating system kernel is closely related to the kernel.
The process of moving from one process to another runs through the following changes:
- Save the processor context, including program counters and other registers.
- Update PCB information.
- The PCB of the process is moved into the appropriate queue, such as ready, in an event blocking queue.
- Select another process to execute and update its PCB.
- Update the data structure of memory management.
- Restore the processing machine context.
Note: In short, it is a very expensive source
Blocking of processes
The executing process, because some expected events did not occur, such as requesting system resources failed, waiting for the completion of an operation, new data has not arrived or no new work to do, etc., the system automatically executes the blocking primitive (block), making itself from the running state into a blocking state. It can be seen that the blocking of a process is an active behavior of the process itself, and therefore it is possible to turn it into a blocking state only if the process is in a running state (acquiring the CPU). When a process goes into a blocking state, it does not consume CPU resources.
File Descriptor fd
File descriptor, a term in computer science, is an abstraction that describes a reference to a file.
The file descriptor is formally a non-negative integer. In fact, it is an index value that points to the record table in which the kernel opens a file for each process maintained by the process. When a program opens an existing file or creates a new file, the kernel returns a file descriptor to the process. In programming, some of the underlying programming often revolves around file descriptors. However, the concept of file descriptors is often applied only to operating systems such as UNIX and Linux.
cache I/O
Cache I/O is also known as standard I/O, and most file system default I/O operations are cache I/O. In the Linux cache I/O mechanism, the operating system caches the I/O data in the file system's page cache, which means that the data is copied into the buffer of the operating system kernel before it is copied from the operating system kernel buffer to the application's address space. user space is not able to directly access the kernel space, the kernel state to the user state of the data copy
Disadvantages of Cache I/O:
Data is required to perform multiple copies of data in the application address space and the kernel during transmission, and the CPU and memory overhead of these data copy operations is very large.
2. Io mode
Just now, for an IO access (read example), the data is copied to the operating system kernel buffer before it is copied from the operating system kernel buffer to the application's address space. So, when a read operation occurs, it goes through two stages:
- Waiting for data preparation (waiting for the
- Copy data from the kernel into the process (Copying the data from the kernel-the process)
Formally because of these two phases, the Linux system produces the following five kinds of network mode scheme.
- Blocking I/O (blocking IO)
- Non-blocking I/O (nonblocking IO)
- I/O multiplexing (IO multiplexing)
- Signal-driven I/O (signal driven IO)
- asynchronous I/O (asynchronous IO)
Note: Since signal driven IO is not commonly used in practice, I only refer to the remaining four IO Model.
1) Blocking I/O (blocking IO)
In Linux, all sockets are blocking by default, and a typical read operation flow is probably this:
When the user process invokes the RECVFROM system call, Kernel begins the first phase of IO: Preparing the data (for network IO, many times the data has not arrived at the beginning.) For example, you have not received a full UDP packet. This time kernel will have to wait for enough data to arrive. This process needs to wait, which means that the data is copied into the buffer of the operating system kernel, which requires a process. On this side of the user process, the entire process is blocked (of course, by the process's own choice of blocking). When kernel waits until the data is ready, it copies the data from the kernel to the user's memory, and then kernel returns the result, and the user process removes the block state and re-runs it.
Therefore, blocking IO is characterized by the blocking of both phases of IO execution.
2) Non-blocking I/O (nonblocking IO)
Under Linux, you can make it non-blocking by setting the socket. When you perform a read operation on a non-blocking socket, the process looks like this:
When the user process issues a read operation, if the data in the kernel is not yet ready, it does not block the user process, but returns an error immediately. From the user process point of view, it initiates a read operation and does not need to wait, but immediately gets a result. When the user process determines that the result is an error, it knows that the data is not ready, so it can send the read operation again. Once the data in the kernel is ready and again receives the system call of the user process, it immediately copies the data to the user's memory and then returns.
Therefore, nonblocking io is characterized by the user process needs to constantly proactively ask kernel data well no.
Multiple concurrency can be achieved, from kernel-state copying to user-state and clogging
3) I/O multiplexing (IO multiplexing)
Io Multiplexing is what we call Select,poll,epoll, and in some places this IO mode is event-driven IO. The benefit of Select/epoll is that a single process can simultaneously handle multiple network connections of IO. The basic principle of the select,poll,epoll is that the function will constantly poll all sockets that are responsible, and when a socket has data arrives, notifies the user of the process.
when the user process invokes select, the entire process is blocked, and at the same time, kernel "monitors" all select-responsible sockets, and when the data in any one socket is ready, select returns. This time the user process then invokes the read operation, copying the data from the kernel to the user process.
Therefore, I/O multiplexing is characterized by a mechanism in which a process can wait for multiple file descriptors at the same time, and any one of these file descriptors (socket descriptors) goes into a read-ready state, and the Select () function can be returned.
4) asynchronous I/O (asynchronous IO)
The asynchronous IO under Inux is actually used very little. Let's take a look at its process:
After the user process initiates the read operation, you can begin to do other things immediately. On the other hand, from the perspective of kernel, when it receives a asynchronous read, first it returns immediately, so no block is generated for the user process. Then, kernel waits for the data to be ready and then copies the data to the user's memory, and when all this is done, kernel sends a signal to the user process to tell it that the read operation is complete.
Like: After the net purchase, directly can do other, then courier sent to the door at home.
3. Summary
The difference between blocking IO and non-blocking IO:
- Call blocking will block the corresponding process until the operation is complete
- Non-blocking IO returns immediately if the kernel also prepares the data.
The difference between synchronous IO and asynchronous IO:
- Synchronous IO will block the process when it is "IO operation", blocking IO, non-blocking IO, io multiplexing are synchronous IO
- Asynchronous is not the same, when the process initiates an IO operation, the direct return is ignored until kernel sends a signal telling the process that IO is complete. Throughout this process, the process has not been blocked at all.
4. Select Poll Epoll io multiplexing Introduction
First of all, Sellect, poll, epoll the difference between the three
Select
Select was first seen in 1983 in 4.2BSD, and it is used by a select () system to monitor arrays of multiple file descriptors, and when select () returns, the ready file descriptor in the array is changed by the kernel to the flag bit. Allows the process to obtain these file descriptors for subsequent read and write operations.
Select is currently supported on almost all platforms
A disadvantage of select is that the maximum number of file descriptors that a single process can monitor is limited to 1024 on Linux, but can be improved by modifying the macro definition or even recompiling the kernel.
In addition, the data structure maintained by select () stores a large number of file descriptors, with the increase in the number of file descriptors, the cost of replication increases linearly. At the same time, because the latency of the network response time makes a large number of TCP connections inactive, but calling select () takes a linear scan of all sockets, so this also wastes some overhead.
Poll
It is not substantially different from select in nature, but poll does not have a limit on the maximum number of file descriptors.
It is generally not used, equivalent to the transition phase
Epoll
It was not until Linux2.6 that the kernel directly supported the implementation method, that is epoll. A multi-channel I/O readiness notification method that is considered to be the best performance under Linux2.6. Windows does not support
There is no limit to the maximum number of file descriptors.
For example, 100 connections, there are two active, Epoll will tell the user that the two two active, directly take the OK, and select is a loop over.
(understanding) Epoll can support both horizontal and edge triggering (edge triggered, which only tells the process which file descriptor has just become ready, it only says it again, and if we do not take action then it will not be told again, this way is called edge triggering), The performance of edge triggering is theoretically higher, but the code implementation is quite complex.
Another essential improvement is the epoll adoption of event-based readiness notification methods. In Select/poll, the kernel scans all monitored file descriptors only after a certain method is called, and Epoll registers a file descriptor beforehand with Epoll_ctl (), once it is ready based on a file descriptor, The kernel uses a callback mechanism like callback to quickly activate the file descriptor and be notified when the process calls Epoll_wait ().
So the so-called asynchronous Io, such as Nginx, Tornado, etc., we call it asynchronous Io, is actually IO multiplexing.
Asynchronous IO module, 3.0 miles, called Asyncio.
Reprint please be sure to keep this source: http://blog.csdn.net/fgf00/article/details/52793739
5. Select IO multiplexed code example
Select simulates a socket server, noting that the socket must be non-blocking for IO multiplexing.
Here's an example of how select handles multiple non-blocking socket connections at the same time through a single process implementation.
Service side
ImportSelectImportSocketImportQueueserver = Socket.socket () server.bind ((' localhost ',9000)) Server.listen ( +) Server.setblocking (False)# Set to non-blocking mode, accept and recv are non-blocking# here if the direct server.accept (), if no connection will be error, so there is data to tune them# Blockioerror:[winerror 10035] A non-blocking socket operation cannot be completed immediately. Msg_dic = {}inputs = [Server,]# give the list to the kernel, select detection. # must have a value to let select detect, otherwise the error provides invalid parameters. # No other connection before, oneself is a socket, oneself is a connection, detect oneself. Event description There are linksoutputs = []# What do you put in there, next time you're out ? while True: Readable, writeable, exceptional = select.select (inputs, outputs, inputs)# definition Detection #新来连接 Detection list exception (broken) # The exception is also inputs: detecting the presence of those connections is abnormalPrint (readable,writeable,exceptional) forRinchReadable:ifR isServer# There's data, a new connection is representedconn, addr = Server.accept () print ("A new connection.", addr) inputs.append (conn)# Add the connection to the test list, if the connection is active, it means the data is coming. # inputs = [Server.conn] # "conn" returns only active connections, but how to determine who is active # If the server is active, a new connection is made, and the Conn activity comes to the dataMsg_dic[conn] = queue. Queue ()# Initialize a queue, followed by the data to be returned to this client Else:Try: Data = R.RECV (1024x768)# Note Here is R, not conn, multiple connection ScenariosPrint"Receive data", data)# r.send (data) # can not be sent directly, if the client does not receive, the information is goneMsg_dic[r].put (data)# Put the data insideOutputs.append (R)# put in the returned connection queue exceptConnectionreseterror asE:print ("Client Disconnected", R)ifRinchOutputs:outputs.remove (R)#清理已断开的连接Inputs.remove (R)#清理已断开的连接 delMSG_DIC[R]# #清理已断开的连接 forWinchWriteable:# The list of connections to return to the clientData_to_client = Msg_dic[w].get ()# Take the data in the dictionaryW.send (data_to_client)# return to clientOutputs.remove (W)# Delete this data to make sure the next loop does not return the completed connection. forEinchExceptional:# If the connection is broken, delete the connection related data ifEinchOutputs:outputs.remove (e) inputs.remove (e)delMsg_dic[e]
Client
socketsocket.socket()client.connect((‘localhost‘9000))while True: cmd = input(‘>>> ‘).strip() if0continue client.send(cmd.encode(‘utf-8‘)) data = client.recv(1024) print(data.decode())client.close()
6. Selectors Module
Selectors has ipoll, select Package, the default with Epoll, if the machine does not support epoll, such as window is not support Epoll, use SELECT.
Service side
ImportSelectorsImportSocketsel = selectors. Defaultselector () def accept(sock, mask):conn, addr = Sock.accept ()# Start ConnectionPrint' accepted ', Conn,' from ', addr) conn.setblocking (False)# Connection set to non-blocking modeSel.register (conn, selectors. Event_read, READ)# Register the conn in the Sel object # New Connection callback read function def Read(conn, mask):data = CONN.RECV (1024x768)# Receive Data ifData:print (' echoing ', repr (data),' to ', conn) conn.send (data)# Hope It won ' t block Else: Print (' closing ', conn) Sel.unregister (conn)# Cancel RegistrationConn.close () sock = Socket.socket () sock.bind ((' localhost ',9000)) Sock.listen ( -) Sock.setblocking (False) Sel.register (sock, selectors. Event_read, accept)# Register Events# Sock Register to come up with a new connection call this function while True: Events = Sel.select ()# It is possible to call Epoll, and it is possible to call Select to see System support # Default is blocked, there is an active connection, return the list of activities forKey, MaskinchEvents:callback = Key.data# Drop the Accept functionCallback (Key.fileobj, mask)# key.fileobj = File handle (equivalent to the one detected in the previous example)
Reprint please be sure to keep this source: http://blog.csdn.net/fgf00/article/details/52793739
Python (10) Under: Event-driven vs. blocking Io, nonblocking io, io multiplexing, asynchronous IO