Python------IO Model

Source: Internet
Author: User
Tags epoll

First, IO Model:

1.blocking io blocking IO

2.nonblocking io non-blocking IO

3.IO Multiplexing IO Multiplexing

4.signal driven IO signal driven IO

5.asynchronous io Asynchronous IO

Second, blocking IO (blocking IO)

In Linux, all sockets are blocking by default.

The blocking IO is characterized by the two phases of IO execution (two phases of waiting for data and copying data) that are block.

Blocking interface: Refers to a system call (typically an IO interface) does not return the call result and keeps the current thread blocked until the system call obtains a result or a time-out error.

A ' thread pool ' or ' connection pool ' may relieve some of the stress, but it won't solve all the problems. In short, multithreaded models can easily and efficiently solve small-scale service requests, but in the face of large-scale service requests, multithreading model will encounter bottlenecks, you can use non-blocking interface to try to solve the problem.

Third, non-blocking IO

Under Linux, you can make it non-blocking by setting the socket.

In non-blocking IO, the user process is in fact required to constantly proactively ask kernel data to be prepared.

#Service SideImportSocketsk=socket.socket () Sk.bind ('127.0.0.1', 8080) ) Sk.listen () sk.setblocking (False) Conn_lst= [] whileTrue:Try: Conn,addr= Sk.accept ()#non-blocking has links toconn_lst.append (conn)exceptBlockingioerror:del_lst= []         forCinchConn_lst:#to execute this sentence .            Try: Msg= C.recv (Ten). Decode ('Utf-8')#recv won't block .                if  notmsg:c.close () del_lst.append (c)Else:                    Print(msg) c.send (Msg.upper (). Encode ('Utf-8'))            exceptBlockingioerror:Pass        ifDel_lst: forDel_iteminchDel_lst:conn_lst.remove (Del_item)
non-blocking IO instance server
Import TimeImportSocketImportThreadingdeffunc (): SK=socket.socket () Sk.connect ('127.0.0.1', 8080)) Time.sleep (1) Sk.send (b'Hi')    Print(SK.RECV (10) ) Sk.close () forIinchRange (10): Threading. Thread (Target=func,). Start ()
non-blocking IO instance client

Advantages and disadvantages of non-blocking IO

Pros: Ability to do other work while waiting for the task to complete (including submitting other tasks, i.e. ' backstage ' can have multiple tasks at ' simultaneous ' execution).

Disadvantages:

1. Cyclic call recv () will significantly push up the CPU occupancy rate, which is why we leave a sentence of Time.sleep (2) in the code, otherwise it is very prone to the card machine under the low distribution host.

2. The Response latency for task completion is increased, because each time a read operation is polled, the task may be completed at any time between two polls. This can result in a decrease in overall data throughput.

In addition, recv () in non-blocking IO is more of a function of detecting whether "operation is complete", and the actual operating system provides a more efficient interface for detecting whether "operation is done", such as select () multiplexing mode, which can detect multiple connections at once.

Quad-multiplexed IO

When the user process invokes select, the entire process is blocked, and at the same time, kernel "monitors" all select-responsible sockets, and when the data in any one socket is ready, select returns. This time the user process then invokes the read operation, copying the data from the kernel to the user process.
This is not quite the same as blocking IO, in fact it's even worse. Because there are two system calls (select and recvfrom) that need to be used, blocking IO only calls a system call (Recvfrom). However, the advantage of using select is that it can handle multiple connection at the same time.

Emphasize:

1. If the number of connections processed is not high, Web server using Select/epoll does not necessarily perform better than the Web server using multi-threading + blocking IO, and may be more delayed. The advantage of Select/epoll is not that a single connection can be processed faster, but that it can handle more connections.

2. In a multiplexed model, for each socket, it is generally set to non-blocking, but, as shown, the entire user's process is always block. Only the process is the block of the Select function, not the socket IO.

Conclusion: The advantage of select is that it can handle multiple connections, not for a single connection

ImportSocketImportSelectsk=socket.socket () Sk.bind ('127.0.0.1', 8099) ) Sk.listen () Read_lst=[SK] whileTRUE:RL,WL,XL= Select.select (read_lst,[],[])#Select block, RL can read the WL can be written XL can be changed [Sk,conn]     forIteminchRL:ifitem = =sk:conn,addr= Item.accept ()#There's data waiting for it to receiveread_lst.append (conn)Else: Ret= Item.recv (1024x768). Decode ('Utf-8')            if  notret:item.close () read_lst.remove (item)Else:                Print(ret) item.send ('received%s'%ret). Encode ('Utf-8'))
IO multiplexing Service side
Import TimeImportSocketImportThreadingdefClient_async (args): SK=socket.socket () Sk.connect ('127.0.0.1', 8099))     forIinchRange (10): Time.sleep (2) Sk.send ('%s[%s]: Hello'% (Args,i)). Encode ('Utf-8'))        Print(SK.RECV (1024) ) Sk.close () forIinchRange (10): Threading. Thread (Target=client_async,args= ('*'*i,)). Start ()
IO multiplexing Client

Advantages and disadvantages of multiplexing IO

Pros: The event-driven model using select () uses only single-threaded (process) execution compared to other models, consumes less resources, consumes too much CPU, and can serve multiple clients. If you try to build a simple event-driven server program, this model has some reference value.

Disadvantages:

The 1.select () interface is not the best choice for "event driven" implementations. Because the Select () interface itself consumes a lot of time to poll each handle when the value of the handle to be probed is large.

2. Many operating systems provide a more efficient interface, such as Linux provides the EPOLL,BSD provides the Kqueue,solaris provides the/dev/poll, .... Interfaces like Epoll are recommended if you need to implement more efficient server programs. Unfortunately, the Epoll interface for different operating systems is a big difference, so using a epoll-like interface to implement a server with better cross-platform capabilities can be difficult.

3. The model is a combination of event detection and incident response, which is catastrophic for the entire model once the event response is large.

V. Asynchronous IO

After the user process initiates the read operation, you can begin to do other things immediately. On the other hand, from the perspective of kernel, when it receives a asynchronous read, first it returns immediately, so no block is generated for the user process. Then, kernel waits for the data to be ready and then copies the data to the user's memory, and when all this is done, kernel sends a signal to the user process to tell it that the read operation is complete.

Vi.. Selectors module

IO multiplexing: In order to explain this noun, first to understand the next reuse of the concept, reuse is the meaning of common, so that understanding or some abstraction, so we understand the use of the next reuse in the field of communication in order to make full use of the network connected physical media, Often on the same network link using time Division multiplexing or frequency division multiplexing technology to transmit multiple signals on the same link, here we basically understand the meaning of reuse, that is, a common "medium" to do as much as possible the same class (nature), that IO multiplexing "media" is what? To this end we first look at the server programming model, the client sends the request service side will produce a process to service it, whenever a customer request to generate a process to serve, but the process is not infinite generation, so in order to solve the problem of a large number of client access, the introduction of IO multiplexing technology, That is, a process can service multiple customer requests at the same time. That is, the IO multiplexing "media" is a process (which is exactly the same as select and poll, because the process is also done by invoking Select and poll), reusing a process (select and poll) to service multiple IO  Although the IO that is sent by the client is concurrent but the read-write data required by IO is not prepared in most cases, a function (select and poll) can be used to listen to the state of the data required for IO, and once IO has data to read and write, the process is able to service such IO. After understanding IO multiplexing, we are looking at the differences and linkages between the three APIs (select, poll, and Epoll) that implement IO multiplexing, and select,poll,epoll are all IO multiplexing mechanisms, I/O multiplexing is the mechanism by which multiple descriptors can be monitored, and once a descriptor is ready (usually read-ready or ready to write), the application can be notified of the appropriate read and write operations. But select,poll,epoll are essentially synchronous I/O, because they all need to read and write when the read-write event is ready, that is, the read-write process is blocked, and asynchronous I/O does not have to be responsible for reading and writing, asynchronous i/the implementation of O is responsible for copying the data from the kernel to the user space. The three prototypes are as follows: int select (int Nfds, Fd_set*readfds, Fd_set *writefds, Fd_set *exceptfds, struct timeval *timeout); int poll (struct POLLFD*FDS, nfds_t nfds, int timeout), int epoll_wait (int epfd, struct epoll_event*events, int maxevents, int timeout); The first parameter of the 1.select Nfds is the maximum descriptor value in the Fdset collection plus 1,fdset is a bit array with a size limit of __fd_setsize (1024), each bit of a bit array represents whether its corresponding descriptor needs to be checked. The No. 234 parameter represents an array of file descriptor bits that require attention to read, write, and error events, which are both input parameters and output parameters, and may be modified by the kernel to indicate events of interest on which descriptors, so the fdset needs to be reinitialized each time a select is called. The timeout parameter is a time-out, and the structure is modified by the kernel with a value of the time remaining for the timeout. The call procedure for select is as follows: (1) use Copy_from_user to copy fdset from the user space to the kernel space (2) Register the callback function __pollwait (3) iterates through all FD, calling its corresponding poll method (for the socket, this poll method is sock_poll,sock_poll depending on the situation will be called to Tcp_poll,udp_poll or Datagram_poll) (4taking Tcp_poll as an example, its core implementation is __pollwait, which is the callback function registered above. (5)__pollwait's main job is to(current process) hangs to the waiting queue of the device, different devices have different waiting queues, and for Tcp_poll, their wait queue is sk->Sk_sleep (Note that suspending the process to the waiting queue does not mean that the process is already asleep). When the device receives a message (network device) or fills out the file data (disk device), it wakes the device to wait for the sleep process on the queue, and current is awakened. (6The poll method returns a mask mask that describes whether the read-write operation is ready, and assigns a value to fd_set based on the mask mask. (7if all of the FD is traversed, and no read-write mask is returned, the call to Schedule_timeout is the process of calling select (that is, current) into sleep. When a device driver takes its own resource to read and write, it wakes up the process of waiting for sleep on the queue. If there is more than a certain timeout (schedule_timeout specified), or no one wakes up, then the process calling select will be woken up to get the CPU again, and then iterate over the FD to determine if there is no ready FD. (8) Copy the fd_set from the kernel space to the user space. Summarize some of the major drawbacks of select: (1each time a select is called, the FD collection needs to be copied from the user state to the kernel state, which is very expensive when FD is large (2At the same time, each call to select requires the kernel to traverse all of the FD that is passed in, which is also very expensive when FD is very large (3the number of file descriptors supported by Select is too small and the default is2. Poll, unlike Select, passes a POLLFD array to the kernel to pass events that need attention, so there is no limit to the number of descriptors, and the events field and revents in POLLFD are used to indicate the event of concern and the event that occurs. Therefore, the POLLFD array needs to be initialized only once. Poll's implementation mechanism is similar to select, which corresponds to the sys_poll in the kernel, except that poll passes the POLLFD array to the kernel, then POLLFD each descriptor in poll, which is more efficient than fdset. When poll returns, it is necessary to check its revents value for each element in the POLLFD, referring to whether the event occurred. 3. It was not until Linux2.6 that the kernel directly supported the implementation method, that is Epoll, is recognized as the best performance of the Linux2.6 multi-channel I/O readiness notification method. Epoll can support both horizontal and edge triggering (edge triggered, which only tells the process which file descriptor has just become ready, it only says it again, and if we do not take action then it will not be told again, this way is called edge triggering), The performance of edge triggering is theoretically higher, but the code implementation is quite complex. Epoll also only informs those file descriptors that are ready, and when we call Epoll_wait () to get the ready file descriptor, the return is not the actual descriptor, but a value representing the number of ready descriptors, You just have to go to the Epoll specified array to get the appropriate number of file descriptors, and memory mapping (MMAP) technology is used, which completely eliminates the overhead of copying these file descriptors on system calls. Another essential improvement is the epoll adoption of event-based readiness notification methods. In select/in poll, the kernel scans all monitored file descriptors only after a certain method is called, and Epoll registers a file descriptor with Epoll_ctl () beforehand, and once the kernel is ready based on a file descriptor, it uses a similar callback callback mechanism , the file descriptor is activated quickly and is notified when the process calls Epoll_wait (). Since Epoll is an improvement on select and poll, the above three drawbacks should be avoided. How did that epoll all work out? Before we take a look at the different invocation interfaces of Epoll and select and poll, both Select and poll provide only a function--select or poll function. While Epoll provides three functions, Epoll_create,epoll_ctl and Epoll_wait,epoll_create are created with a epoll handle; Epoll_ctl is the type of event registered to listen; Epoll_  Wait is waiting for the event to occur. For the first drawback, the Epoll solution is in the Epoll_ctl function. Each time a new event is registered in the Epoll handle (specifying Epoll_ctl_add in Epoll_ctl), all FD is copied into the kernel instead of being duplicated at epoll_wait.  Epoll guarantees that each FD will be copied only once throughout the process. For the second disadvantage, the solution for Epoll is not to add current to the FD-corresponding device-waiting queue each time, like Select or poll, but to hang the current only once at Epoll_ctl (which is necessary) and to specify a callback function for each FD. This callback function is invoked when the device is ready to wake the waiting queue, and the callback function will add the ready FD to a ready list.  Epoll_wait's job is actually to see if there is a ready-to-use FD (using Schedule_timeout () to sleep for a while, judging the effect of a meeting, and the 7th step in the Select implementation is similar). For the third disadvantage, Epoll does not have this limitation, it supports the maximum number of open files can be opened, this number is generally far greater than 2048, for example, in 1GB memory of the machine about about 100,000, the specific number can be cat/proc/sys/fs/file-Max observes that this number is generally very large with the system memory. Summary: (1The Select,poll implementation requires itself to continually poll all FD collections until the device is ready, during which time the sleep and wake-up cycles may be repeated. While Epoll actually needs to call Epoll_wait to constantly poll the ready linked list, there may be multiple sleep and wake alternates, but when it is device ready, call the callback function, put the ready FD into the Ready list, and wake the process into sleep in epoll_wait. Although all have to sleep and alternate, but select and poll in "Awake" time to traverse the entire FD collection, and Epoll in "awake" as long as to determine whether the ready linked list is empty, which saves a lot of CPU time, this is the callback mechanism brought about by the performance improvement. (2) Select,poll each call to the FD set from the user state to the kernel state copy once, and to the device to wait for the queue to hang once, and epoll as long as a copy, and the current to wait for the queue is hung only once (at the beginning of epoll_wait , note that the wait queue here is not a device waiting queue, but a epoll internally defined wait queue, which can also save a lot of overhead.
Select,poll,epoll

These three IO multiplexing models have different support on different platforms, and Epoll is not supported under Windows, fortunately we have selectors modules that help us to choose the most suitable one under the current platform by default.

#Service Side fromSocketImport*ImportSelectorssel=selectors. Defaultselector ()#to create a default multiplexing modeldefAccept (SK): Conn,addr=sk.accept () sel.register (conn,selectors. Event_read,read)defREAD (conn):Try: Data=CONN.RECV (1024)        if  notData#Win8 win10            Print('closing', Conn) sel.unregister (conn) conn.close ()returnConn.send (Data.upper ()+b'_SB')    exceptException:#Linux Operating system        Print('closing', Conn) sel.unregister (conn) conn.close () SK=socket (af_inet,sock_stream) sk.setsockopt (sol_socket,so_reuseaddr,1) Sk.bind ('127.0.0.1', 8088)) Sk.listen (5) sk.setblocking (False)#the interface for setting the socket is non-blockingSel.register (sk,selectors. event_read,accept)#equivalent to the Read list of the network select Append a file handle Server_fileobj, and a callback function is bound to accept whiletrue:events=sel.select ()#detects all fileobj, if there is a #[sk,conn to complete the wait data]     forSel_obj,maskinchEvents#someone touched the object you registered with the SEL .Callback=sel_obj.data#Callback=accpet # sel_obj.data can get the Accept/read method that was written when he was registered.Callback (Sel_obj.fileobj)#Accpet (SK)/read (conn)
realization of Chat-server based on selectors module
#Client fromSocketImport*C=socket (Af_inet,sock_stream) c.connect (('127.0.0.1', 8088)) whiletrue:msg=input ('>>:')    if  notMsgContinuec.send (Msg.encode ('Utf-8')) Data=C.RECV (1024)    Print(Data.decode ('Utf-8'))
implementation of chat-client based on selectors module

Python------IO Model

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.