Python co-process, Io multiplexing method

Source: Internet
Author: User
Tags epoll
Co-process

is a user-configured link thread micro-threading

Benefits

No overhead for thread context switching

No need for atomic operation locking and synchronization overhead

Simplified programming model for easy switching control

High concurrency + high scalability + low cost

Disadvantages

The inability to take advantage of multi-core resources is essentially a single thread that does not need to use multicore to combine multiple processes to take advantage of multi-core

Blocking will block out the entire program

Using yield to implement a co-process operation

DEF consumer (name):    print ("--->starting eating baozi ...") while    True:        new_baozi = yield        print ("[% S] is eating Baozi%s "% (name, New_baozi))        # time.sleep (1)  def producer ():    r = con.__next__ ()    r = con2.__ next__ ()    n = 0 while    N < 5:        n + = 1        con.send (n)        con2.send (n)        print ("\033[32;1m[producer ]\033[0m is making Baozi%s "% n)  if __name__ = = ' __main__ ':    con = consumer (" C1 ")    con2 = consumer (" C2 ") 
  p = producer ()

A definition of the association Process

1. Concurrency must be implemented in only one single thread

2. No lock required to modify shared data

3. The context stack in the user program that holds multiple control flows

4. A co-process encounters an IO operation to automatically switch to another coprocessor

A small example of a process manually switching IO operations

Greenlet is a C implementation of the process module compared to the yield of Python and it can allow you to arbitrarily switch between any function without the need to declare this function as generator

Import Greenlet def test1 ():    print (All)    g2.switch ()    print (g2.switch)    def test2 ():    print (56)    g1.switch ()    print (G1) = Greenlet.greenlet (test1) g2 = Greenlet.greenlet (test2) G1.switch ()

Gevent is a third-party library that can easily implement concurrent or asynchronous programming through gevent the main pattern used in Gevent is Greenlet, which is a lightweight coprocessor that accesses Python in the form of a C extension module. Greenlet all run inside the main program operating system process but they are dispatched in a collaborative manner

Auto switch of IO operation encountered by the coprocessor

Import gevent  def foo ():    print ("Running in foo")    Gevent.sleep (2)    print ("Explicit Context Switch to Foo Again ")  def Bar ():    print (" Explicit context to Bar ")    gevent.sleep (1)    print (" Implicit context switch Back to Bar ")  def func3 ():    print (" Running func3 ")    gevent.sleep (0)    print (" Running func3 Again ") Gevent.joinall ([    gevent.spawn (foo),  # Generate    gevent.spawn (bar),    gevent.spawn (FUNC3),])

Performance differences between synchronous and asynchronous

Import gevent  def task (PID):    gevent.sleep (0.5)    print ("task%s done"% pid)  def synchronous (): For    i in range (1,10):        task (i)  def asynchronous ():    threads = [Gevent.spawn (task, I) for I in range (10 )]    gevent.joinall (threads) Print ("Synchronous:") synchronous () Print ("Asynchronous:") asynchronous ()

Gevent Concurrent Crawl Web pages

From urllib import requestimport geventfrom gevent import Monkey  def f (URL):    print ("GET:%s"% URL)    resp = req Uest.urlopen (URL)    data = Resp.read ()    # file = open ("url.html", "WB")    # file.write (data)    # File.close ()    print ("%d bytes received from%s."% (len (data), URL))  monkey.patch_all ()  # Give me all the IO operations of the current program to my own Mark Gevent.joinall ([    Gevent.spawn (F, "http://www.yitongjia.com"),    Gevent.spawn (F, " Http://www.jinyuncai.cn "),    Gevent.spawn (F," https://www.guoshipm.com "),])

Single-threaded multi-socket concurrency via gevent

Service side

Import Sysimport socketimport timeimport gevent from  gevent import Socket,monkeymonkey.patch_all ()    def server ( Port):    s = socket.socket ()    s.bind ((' 0.0.0.0 ', port))    S.listen ($)    while True:        cli, addr = S.accept ()        gevent.spawn (handle_request, CLI)      def handle_request (conn):    try: While        True:            data = conn.recv            print ("recv:", data)            conn.send (data)            if not data:                conn.shutdown (socket. SHUT_WR)      except Exception as  ex:        print (ex)    finally:        conn.close () if __name__ = = ' __main__ ':    Server (8001)

Client

Import Socketimport Threading Def sock_conn ():     client = Socket.socket ()     client.connect (("localhost", 8001))    count = 0    while True:        #msg = input (">>:"). Strip ()        #if len (msg) = = 0:continue        client.send (" Hello%s "%count"). Encode ("Utf-8"))         data = Client.recv (1024x768)         print ("[%s]recv from server:"% threading.get_ Ident (), Data.decode ()) #结果        count +=1    client.close () for  I in range:    t = Threading. Thread (Target=sock_conn)    T.start () concurrent 100 sock connections

Event-driven

Usually we have several models when we write server processing model programs.

Each request that is received generates a process that processes this request

Each request that is received generates a thread to process the request

Each request is placed in an event list to allow the main process to process the request through a non-blocking IO method

The 1th method because the cost of creating a new process is relatively large, resulting in poor server performance, but the implementation is relatively simple.

The 2nd way, because of the possibility of thread synchronization, may face deadlock and so on.

The 3rd way of writing application code is more complex than the previous two.

Comprehensive consideration of various factors generally considered the 3rd approach is the way most Web servers are used

IO multiplexing

I. INTRODUCTION

Kernel Space and user space

Now the operating system is based on virtual storage then for the 32-bit operating system, its address space virtual memory space is 4g2 32 times. The core of the operating system is that the kernel is independent of ordinary applications that can access protected memory space and have all the permissions to access the underlying hardware devices. In order to ensure that the user process can not directly manipulate the kernel kernel to ensure that the kernel security worry system divides the virtual space into two parts as part of the kernel space as the user space. For the Linux operating system, the highest 1G bytes from the virtual address 0xc0000000 to 0xFFFFFFFF for the kernel to use called kernel space and the lower 3G bytes from the virtual address 0x00000000 to 0xBFFFFFFF for each process to use called user space.

Process switching

In order to control the execution of the process the kernel must have the ability to suspend a process that is running on the CPU and resume execution of a previously suspended process. This behavior is called process switching. Therefore, it can be said that any process running under the support of the operating system kernel is closely related to the kernel.

Process blocking

A process that is executing because some of the expected events do not occur, such as requesting system resources to fail, waiting for the completion of an operation, the new data has not arrived, or no new work to do, and so on, the system automatically executes the blocking primitive (block) from the running state to the blocked state. The blocking of the visible process is an active behavior of the process itself and therefore only the running process gets the CPU to be able to turn it into a blocking state. When a process enters a blocking state, it does not consume CPU resources.

File Descriptor FD

Document descriptor file Descriptor is a term in computer science that is an abstraction that describes a reference to a file.

The file descriptor is formally a non-negative integer. In fact, it is an index value pointing to the record table that the kernel maintains for each process to open the file. When a program opens an existing file or creates a new file, the kernel returns a file descriptor to the process. In the program design, some of the underlying programming often revolves around the file descriptor expansion. However, the concept of file descriptors is often applied only to operating systems such as UNIX and Linux.

Cache io

Cache I/O is also known as standard I/o the default I/O operations for most file systems are cache I/O. In the Linux cache I/O mechanism, the operating system caches I/O data in the page cache of the file system, which means that the data is copied to the operating system kernel buffer before it is copied from the operating system kernel buffer to the application's address space.

Disadvantages of Cache I/O

Data copying operations require multiple copies of data in the application address space and the kernel during transmission, and the CPU and memory overhead of these data copy operations is very large.

Two. IO mode

I just said that. For an IO access, the read example data is first copied to the operating system kernel buffer before it is copied from the operating system kernel buffer to the application's address space. So when a read operation happens, it goes through two stages.

1. Wait for data preparation (waiting for the

2. Copying data from the kernel to the process (Copying the data from the kernel to the)

Formally because these two phases of Linux system produce the following five kinds of network mode scheme.

-Blocking i/oblocking IO

-Non-blocking i/ononblocking IO

-I/O multiplexed IO multiplexing

-Signal-driven I/O signal driven IO

-Asynchronous I/oasynchronous IO

Blocking IO

In Linux, by default, all sockets are blocking a typical read operation flow is probably the case

When the user process calls Recvfrom This system call kernel begins the first phase of IO to prepare the data for network IO many times the data has not arrived at the beginning. For example, you have not received a full UDP packet. This time kernel will wait for enough data to come. This process requires a process to wait for the data to be copied into the buffer of the operating system kernel. The entire process will be blocked on the user process side, which is the process's own choice. When kernel waits until the data is ready, it copies the data from the kernel to the user's memory and then kernel back to the resulting user process to re-run the block state.

So the blocking Io is characterized by block in both phases of IO execution.

Non-blocking IO

Linux can be changed to non-blocking by setting the socket. This is what the process looks like when performing read operations on a non-blocking socket

If the data in kernel is not ready when the user process issues a read operation, it does not block the user process but immediately returns an error. From the user process point of view it initiates a read operation and does not need to wait but to get a result immediately. When the user process determines that the result is an error, it knows that the data is not ready so it can send the read operation again. Once the data in the kernel is ready and again receives the system call of the user process, it immediately copies the data to the user's memory and returns.

So nonblocking io is characterized by the user process needs to constantly proactively ask kernel data well no.

IO multiplexing

Io Multiplexing is what we're talking about selectpollepoll some places also call this IO mode as event driven IO. The benefit of Select/epoll is that a single process can simultaneously handle multiple network connections of IO. The basic principle is that selectpollepoll this function will continually poll all sockets responsible for notifying the user process when a socket has data arrived.

When a user process invokes a select then the entire process is blocked and kernel will "monitor" all the select-responsible sockets when the data in any one socket is ready for the Select to return. This time the user process then calls the read operation to copy the data from the kernel to the user process.

So I/O multiplexing is characterized by a mechanism in which a process can wait for multiple file descriptors at the same time, and any one of these file descriptor socket descriptors enters the read-ready state the Select () function can return.

This figure and blocking IO diagram is not very different in fact, it is even worse. Because two system calls (select and Recvfrom) are required, blocking IO only invokes one system call (Recvfrom). But the advantage of using select is that it can handle multiple connection at the same time.

So if the number of connections processed is not high, Web server using Select/epoll is not necessarily more likely to delay than using multi-threading + blocking IO for Web server performance. The advantage of Select/epoll is not that a single connection can be processed faster, but that it can handle more connections.

In the IO Multiplexing model in practice for each socket is generally set to become non-blocking but as shown in the entire user process is actually always block. Only the process is the block of the Select function, not the socket IO.

Asynchronous IO

After the user process initiates the read operation, you can begin to do other things immediately. And on the other hand from kernel's point of view when it receives a asynchronous read first it will return immediately so no block will be generated for the user process. Then kernel waits for the data to be ready and then copies the data to the user's memory. When all this is done, kernel sends a signal to the user process to tell it that the read operation is complete.

IO multiplexing Select, poll, Epoll

Select Example

Server Side

Import selectimport socketimport sysimport Queue Server = Socket.socket () server.bind (("0.0.0.0", 6666)) Server.listen ( Server.setblocking (False) # does not block msg_dic = {} inputs = [Server,]outputs = [] While true:readable, writeable, E        Xceptional = Select.select (inputs, outputs, inputs) print (readable, writeable, exceptional) for R in readable: If r is server: # represents a new connection conn, addr = Server.accept () print ("A new Connection", addr) Inpu            Ts.append (conn) # Because this new connection has not sent data, now receive the program will be error # So to realize this client sends data to the server side can know, you need to let select re-monitor this conn Msg_dic[conn] = queue. Queue () # Initializes a queue followed by the data to be returned to the client Else:data = R.RECV (1024x768) print ("Received data", data) msg    _dic[r].put (data) Outputs.append (R) # put in the returned connection queue # R.send (data) # print ("Send End ...") For W in writeable: # connection list to return to client data_to_client = Msg_dic[w].get () w.send (data_to_client) # return To the clientThe metadata Outputs.remove (w) # ensures that the next loop, writeable does not return the connection that has already been processed for E in Exceptional:if e in outputs: Outputs.remove (e) inputs.remove (e) del Msg_dic[e]

Client Side

Import socket HOST = "127.0.0.1"  # the remote Hostport = 6666  # The same port as used by the servers = Socket.sock ET (socket.af_inet, socket. Sock_stream) S.connect ((HOST, PORT)) while True:    msg = bytes (input (">>:"), encoding= "UTF8")    S.sendall ( msg)    data = S.recv (1024x768)    # Print (data)    print (' Received ', repr (data)) S.close ()

Selectors example

The selectors encapsulates the underlying select or Epoll, which can be judged using Select or Epoll based on different operating systems.

Selectors service side

Import Socketimport selectors  sel = selectors. Defaultselector ()  def accept (sock, mask):    conn, addr = sock.accept ()    print (' accepted ', Conn, ' from ', addr)    conn.setblocking (False)    Sel.register (conn, selectors. Event_read, read)  def read (conn, mask):    data = CONN.RECV (1024x768)    if data:        print (' Echoing ', repr (data), ' To ', conn)        conn.send (data)    else:        print (' closing ', conn)        Sel.unregister (conn)        Conn.close () sock = Socket.socket () sock.bind (("0.0.0.0", 6666)) Sock.listen (+) sock.setblocking (False) sel.register (sock, Selectors. Event_read, accept) while True:    events = Sel.select ()    for key, mask in events:        callback = Key.data        CA Llback (Key.fileobj, mask)

Selectors client

import socket HOST = "127.0.0.1" # The remote Hostport = 6666 # the same port as Used by the servers = Socket.socket (socket.af_inet, socket. Sock_stream) S.connect ((HOST, PORT)) While true:msg = bytes (Input (">>:"), encoding= "UTF8") S.sendall (msg) da TA = S.recv (1024x768) # print (data) print (' Received ', repr (data)) S.close () 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.