Python Development Foundation---interprocess communication, process pooling, and co-process

Source: Internet
Author: User
Tags epoll generator network function terminates

Inter-process communication

Processes are isolated from each other, and to implement interprocess communication (IPC), the Multiprocessing module supports two forms: queues and pipelines, both of which are delivered using messaging.

Process Queues Queue

Unlike the thread queue, the process queue generation is generated using the multiprocessing module.

When a child process is generated, the code is copied to the child process and executed once, and the child process has a different namespace than the contents of the main process.

Example 1:

1 Import multiprocessing 2 def foo (): 3     q.put ([One, ' Hello ', True]) 4     print (Q.qsize ()) 5  6 q=multiprocessing. The queue () #全局定义一个q进程队列, which is generated in the child process when the child process is generated, can specify the maximum number, which limits the length of 7 if __name__ = = ' __main__ ': 8     p=multiprocessing. Process (target=foo,args= ()) #因为名称空间不同, the main thread of the child process is created by the Q queue, the master process get is not reached, so it will block 9     P.start () #     foo () # The main process executes a function to access the     print (Q.get ())

Example 2:

1 Import multiprocessing 2  3 def foo (): 4     q.put ([One, ' Hello ', True]) 5     print (Q.qsize ()) 6  7 if __name__ = = ' __main__ ': 8     q = multiprocessing. Queue () #主进程创建一个q进程队列 9     p=multiprocessing. Process (target=foo,args= ()) #因为名称空间不同, the main thread of the child process can not find the Q queue, so the error will indicate that there is no Q10     P.start ()     print (Q.get ())

Example 3:

1 Import multiprocessing 2  3 def foo (argument):      #定义函数处理进程队列 4     argument.put ([One, ' Hello ', True]) 5     Print (Argument.qsize ()) 6 q = multiprocessing. Queue () #全局定义一个进程队列 7 print (' Test ') 8  9 If __name__ = = ' __main__ ': ten     x = multiprocessing. Queue ()   #主进程定义一个进程队列11     p=multiprocessing. Process (target=foo,args= (x,))     #主进程把值传给子进程就可以处理了12     p.start ()     print (X.get ())     # foo (q) 15     # Print (Q.get ())

Common methods

The Q.put method is used to insert data into the queue, and the put method has two optional parameters: Blocked and timeout. If blocked is true (the default) and timeout is a positive value, the method blocks the time specified by timeout until the queue has the remaining space. If timed out, a Queue.full exception is thrown. If blocked is false, but the queue is full, an Queue.full exception is thrown immediately. The Q.get method can read from the queue and delete an element. Similarly, the Get method has two optional parameters: Blocked and timeout. If blocked is true (the default) and timeout is a positive value, then no element is taken within the wait time, and a Queue.empty exception is thrown. If blocked is False, there are two cases where the queue has a value that is available and immediately returns the value, otherwise the Queue.empty exception will be thrown immediately if it is empty. Q.get_nowait (): Same as Q.get (False) q.put_ NoWait (): Same as Q.put (False) Q.empty (): When calling this method, q is null to return true, and the result is unreliable, such as in the process of returning true, if the item is added to the queue. Q.full (): When this method is called, Q is full to return true, and the result is unreliable, for example, in the process of returning true, if the items in the queue are taken away. Q.qsize (): Returns the correct number of current items in the queue, and the results are unreliable, for the same reason as Q.empty () and Q.full ()

Other methods

Q.cancel_join_thread (): The background thread is not automatically connected when the process exits. You can prevent the Join_thread () method from blocking Q.close (): Close the queue and prevent more data from being added to the queue. Calling this method, the background thread will continue to write to the data that has been queued but not yet written, but will close as soon as this method completes. If q is garbage collected, this method is called. Closing a queue does not produce any type of data end signal or exception in the queue consumer. For example, if a consumer is being blocked on a get () operation, shutting down a queue in a producer does not cause the get () method to return an error. Q.join_thread (): The background thread that connects the queue. This method is used to wait for all queue items to be consumed after the Q.close () method is called. By default, this method is called by all processes that are not the original creator of Q. Calling the Q.cancel_join_thread method can prohibit this behavior

Another class to create a process queue

Http://www.cnblogs.com/zero527/p/7211909.html

Piping pipe

Piping is a pipe, like a pipe in life, can be entered at both ends

The default pipe is full-duplex, if the pipeline is created with a false map, the left can only be used for reception, and the right can only be used for sending, similar to one-way street

The simplest example of pipeline bidirectional communication:

1 Import multiprocessing 2  3 def foo (SK): 4     sk.send (' Hello World ') 5     print (SK.RECV ()) 6  7 if __name__ = = ' __main__ ': 8     conn1,conn2=multiprocessing. Pipe ()    #开辟两个口, all are able to go out, in parentheses if false that is one-way communication 9     p=multiprocessing. Process (target=foo,args= (CONN1,))  #子进程使用sock口, call the Foo function,     p.start ()     print (CONN2.RECV ())  # The main process uses the Conn port to receive the     conn2.send (' Hi son ') #主进程使用conn口发送

Common methods

CONN1.RECV (): Receives the object sent by Conn2.send (obj). If there is no message to receive, the Recv method is blocked. If the other end of the connection is closed, the Recv method throws Eoferror. Conn1.send (obj): Sends an object over a connection. Obj is an arbitrary object that is compatible with serialization
Note: The Send () and recv () methods use the Pickle module to serialize the object

Other methods

Conn1.close (): Closes the connection. If Conn1 is garbage collected, this method will be called automatically, and both sides must be Closeconn1.fileno (): Returns the integer file descriptor used by the connection conn1.poll ([timeout]): Returns True if the data on the connection is available. Timeout Specifies the maximum time period to wait. If this argument is omitted, the method returns the result immediately. If the timeout is fired to none, the operation waits indefinitely for the data to arrive. Conn1.recv_bytes ([maxlength]): Receives a complete byte message sent by the C.send_bytes () method. MAXLENGTH Specifies the maximum number of bytes to receive. If the incoming message exceeds this maximum, a IOError exception is thrown and no further reads can be made on the connection. If the other end of the connection is closed and no more data exists, a Eoferror exception is thrown. Conn.send_bytes (buffer [, offset [, size]]): Sends a byte data buffer through a connection, buffer is any object that supports the buffer interface, offset is the byte offset in the buffer, and size is the number of bytes to send. The result data is emitted as a single message, and then the C.recv_bytes () function is called to receive     conn1.recv_bytes_into (buffer [, offset]): Receives a complete byte message and saves it in the buffer object That supports a writable buffer interface (that is, a ByteArray object or similar object). offset specifies the byte displacement at which the message is placed in the buffer. The return value is the number of bytes received. If the message length is greater than the available buffer space, an Buffertooshort exception is thrown.

Note: Neither the producer nor the consumer is using an endpoint of the pipeline, it should be closed, such as closing the right end of the pipe in the producer, and closing the left side of the pipe in the consumer. If you forget to perform these steps, the program may hang on the recv () operation in the consumer. A pipeline is a reference count by the operating system, and you must close the pipeline in all processes before you can produce a Eoferror exception. Therefore, closing the pipe in the producer has no effect, and the same pipe end is closed by the paying consumer.

1 from multiprocessing import process,pipe 2  3 import time,os 4 def consumer (p,name): 5     left,right=p 6     left.cl OSE () 7 while     true:8         try:9             baozi=right.recv ()             print ('%s received bun:%s '% (Name,baozi)) one         except Eoferror:12             right.close ()             break14 def producer (seq,p): left,right=p16 right.close     (17) For     i in seq:18         left.send (i)         # time.sleep (1)     else:21         left.close () if __name__ = = ' __ Main__ ':     left,right=pipe ()     c1=process (target=consumer,args= ((left,right), ' C1 '))     C1.start ( )     seq= (I for I in range)     producer (seq, (left,right))     right.close ()     left.close () 30     c1.join ()     print (' main process ')

1 from multiprocessing import process,pipe 2  3 import time,os 4 def consumer (p,name): 5     left,right=p 6     left.cl OSE () 7 while     true:8         try:9             baozi=right.recv ()             print ('%s received bun:%s '% (Name,baozi)) one         except Eoferror:12             right.close ()             break14 def producer (seq,p): left,right=p16 right.close     (17) For     i in seq:18         left.send (i)         # time.sleep (1)     else:21         left.close () if __name__ = = ' __ Main__ ':     left,right=pipe ()     c1=process (target=consumer,args= ((left,right), ' C1 '))     C1.start ( )     seq= (I for I in range)     producer (seq, (left,right))     right.close ()     left.close () 30     c1.join ()     print (' main process ')

Shared data Manage

The queue and pipe only achieve data interaction and do not implement data sharing, i.e. one process to change the data of another process .

Note: interprocess communication should try to avoid using shared data in a way that

Shared data: List

1 from multiprocessing import manager,process 2 def foo (l,i): 3     l.append (i**i) 4 if __name__ = = ' __main__ ': 5     man= Manager () 6     ml=man.list ([11,22,33]) 7     l=[] 8 for     I in range (5): 9         p=process (target=foo,args= (ml,i)) 10         P.start ()         l.append (p) 14 for     I in L: #必须要join, otherwise it will execute an error, processing a data must be one by one, cannot process a data at the same time         I.join     Print (ML)

Shared data: Dictionaries

1 from multiprocessing import manager,process 2 def foo (d,k,v): 3     d[k]=v 4 if __name__ = = ' __main__ ': 5     man=manage R () 6     md=man.dict ({' name ': ' Bob '}) 7     l=[] 8 for     I in range (5): 9         p=process (target=foo,args= (md,i, ' a '))         P.start ()         l.append (p)     14 for I in L: #必须要join, otherwise it will perform an error, processing a data must be one by one, cannot process a data simultaneously         I.join     Print (MD)

Process Pool

Open multi-process is to concurrency, usually have a few CPU cores open several processes, but the process is more open will affect efficiency, mainly reflected in the cost of switching, so the number of process pool throttling process.

A process sequence is maintained internally by the process pool, and when used, a process is fetched in the process pool, and the program waits until a process is available in the process pool sequence if there are no incoming processes available for use.

Example:

1 from multiprocessing Import Pool 2 import time 3  4 def foo (n): 5     print (n) 6     time.sleep (1) 7  8 if __name_ _ = = ' __main__ ': 9     Pool_obj=pool (5)    #10 for I in     range: one         # pool_obj.apply_async (func=foo,args= (i ,))         pool_obj.apply (func=foo,args= (i,))    #子进程的生成是靠进程池对象维护的13         # Apply Sync, sub-process executes each         # Apply_ Async Async, multiple sub-processes execute with     pool_obj.close ()     pool_obj.join ()     print (' ending ')

Common methods:

Pool_obj.apply (func [, args [, Kwargs]): Executes func (*args,**kwargs) in a pool worker process and returns the result. It should be emphasized that this operation does not execute the Func function in all pool worker processes. If you want to execute the Func function concurrently with different parameters, you must call the P.apply () function from a different thread or use the P.apply_async () Pool_obj.apply_async (func [, args [, Kwargs]]): Executes func (*args,**kwargs) in a pool worker process and returns the result. The result of this method is an instance of the AsyncResult class, and callback is a callable object that receives input parameters. When the result of Func becomes available, the understanding is passed to callback. Callback does not prohibit any blocking operations, otherwise it will receive results from other asynchronous operations. Pool_obj.close (): Closes the process pool to prevent further action. If all operations persist, they will complete pool_obj.jion () before the worker process terminates: waits for all worker processes to exit. This method can only be called after close () or teminate ()

Other methods:

The return value of Method Apply_async () and Map_async () is an instance of Asyncresul obj. The instance has the following method Obj.get (): Returns the result and waits for the result to arrive if necessary. Timeout is optional. If it has not arrived within the specified time, a one will be raised. If an exception is thrown in a remote operation, it is raised again when this method is called. Obj.ready (): If the call is complete, return trueobj.successful (): Returns True if the call completes without throwing an exception, or if this method is called before the result is ready, throws an exception obj.wait ([timeout]): Waits for the result to become available. Obj.terminate (): Immediately terminates all worker processes without performing any cleanup or end of any pending work. If P is garbage collected, this function is called automatically

Co-process

Co-process: is a single-threaded concurrency, also known as micro-threading, fiber. English name Coroutine.

One sentence describes what a thread is: The process is a lightweight thread of user-state, that is, the process is scheduled by the user program itself.

The process can retain the state of the last invocation (that is, a specific combination of all local states), each time the procedure is re-entered, which is equivalent to the state of the last call, in other words: The position of the logical stream at the last departure.

Attention:

1. The python thread is at the kernel level, which is controlled by the operating system (such as single-threaded once the IO is forced to surrender CPU execution permissions, switch other threads to run)

2. Single-line range to open the process, once encountered Io, from the application level (not the operating system) control switch

Advantages of the process:

1. The transition cost of the association is smaller, it belongs to the program-level switch, the operating system is not fully aware, and therefore more lightweight

2. Concurrency can be achieved within a single thread to maximize CPU utilization

Disadvantages of the process:

1. The nature of the process is single-threaded, unable to take advantage of multi-core, can be a program to open multiple processes, each in-process open multiple threads, each line range open the co-

2. The association refers to a single thread, so once the association is blocked, it will block the entire thread

Yield implementation of co-process concurrency

1 Import Time 2 def consumer (): 3     r= "4 while     True:5         N=yield R 6         if not N:7             return 8         print (' [CO Nsumer]←←consuming%s ... '% n) 9         time.sleep (1)         r= ' Ten Ok ' one of the Def Produce (c):     next (c) #1. Start Generator     N =015 while     N < 5:16         n=n+117         print (' [producer]→→producing%s ... '% n)         cr=c.send (n)    19< c15/> #2. Incoming n to Consumer object, yield receives incoming value to start executing code, encounters yield execution code return r value         (' [PRODUCER] Consumer return:%s '% cr) 21     #3. Produce no value, close the whole process.     c.close () if __name__ = = ' __main__ ':     c=consumer ()    #生成生成器对象26     Produce (c)      #执行调用

Greenlet Framework implementation (Base library for package yield)

The main idea of the Greenlet mechanism is that the yield statement in the generator function or the co-function will suspend the execution of the function until later using the next () or send () operation to recover. You can use a scheduler loop to collaborate on multiple tasks between a set of generator functions. Greentlet is a basic library of Python that implements what we call "Coroutine".

Example 1:

1 from Greenlet import  greenlet 2 def foo (): 3     print (' Ok1 ') 4     g2.switch () #阻断 5     print (' Ok3 ') 6     G2.swi TCH () 7 def Bar (): 8     print (' Ok2 ') 9     g1.switch ()     print (' Ok4 ') one-by-one g1=greenlet (foo)    # The Greenlet object that generates the Foo function g2=greenlet (bar)    #生成bar函数的greenlet对象14 g1.switch () #1, executes G1 objects, prints ok115             #2, Encounter G2.switch (), go to G2 to perform print ok216             #3, Encounter G1.switch (), go to G1 block to continue printing ok317             #4, Encounter G2.switch (), go to G2 to perform print Ok4

Example 2:

1 def Eat (name): 2     print ('%s eat food 1 '%name) 3     gr2.switch (' Bob ') 4     print ('%s eat Food 2 '%name) 5     Gr2.s Witch () 6 def play_phone (name): 7     print ('%s play 1 '%name) 8     gr1.switch () 9     print ('%s play 2 '%name) ten gr1= Greenlet (Eat) Gr2=greenlet (Play_phone) gr1.switch (name= ' Natasha ') #可以在第一次switch时传入参数, do not need

This method does not save time because it is not IO operation, while Greenlet encounters IO operation will not jump, still want IO block

Advanced Library Gevent module based on Greenlet framework

Gevent is a third-party library, through the Greenlet implementation of the process, the basic idea is:

When an greenlet encounters an IO operation, such as accessing the network, it automatically switches to the other Greenlet, waits until the IO operation is complete, and then switches back to execution at the appropriate time. Because the IO operation is very time-consuming and often puts the program in a waiting state, with gevent automatically switching the co-process for us, it is guaranteed that there will always be greenlet running, rather than waiting for IO.

Since the switchover is done automatically during IO operations, Gevent needs to modify some of the standard libraries that Python comes with, which is done at startup with Monkey Patch:

Simple example:

1 Import gevent 2 def foo (): 3     print (' Ok1 ') 4     gevent.sleep (4) #模拟io操作 5     print (' Ok3 ') 6 def Bar (): 7     print ( ' Ok2 ') 8     gevent.sleep (2) 9     print (' Ok4 ') ten G1=gevent.spawn (foo) g2=gevent.spawn (bar) Gevent.joinall ([ G1,G2]) #全部阻塞, or join individually

The first argument in spawn parentheses is the function name, such as Foo, which can be followed by multiple arguments, either positional arguments or key arguments, which are passed to the function foo.

Attention:

Gevent.sleep (4) simulates an IO block that gevent can recognize,

and Time.sleep (2) or other blocking, gevent is not directly recognized by the need to use the following line of code, patching, you can identify the

1 #补丁2 from gevent import Monkey3 monkey.patch_all ()

Must be placed in front of the patched person, such as before the Time,socket module

Or we simply remember: To use gevent, you need to put the patch at the beginning of the file

Crawler Example:

1 from gevent import Monkey;monkey.patch_all () 2 Import gevent 3 Import requests 4 Import time 5  6 def get_page (URL): 7     print (' GET:%s '%url) 8     response=requests.get (URL) 9     if Response.status_code = = 200:10         print ('%d Bytes received from%s '% (len (response.text), URL)) (Start_time=time.time () Gevent.joinall ([     Gevent.spawn (get_page, ' https://www.python.org/'),     gevent.spawn (get_page, ' https://www.yahoo.com/'), 17     gevent.spawn (get_page, ' https://github.com/'), 18]) Stop_time=time.time () print (' Run time is%s '% (stop_time-start_time))

Gevent is a coroutine-based Python network function library that provides a high-level concurrency API at the top of the Libev event loop by using Greenlet. The main features are the following:<1> based on the Libev fast event loop, Linux above is the epoll mechanism <2> Greenlet-based lightweight execution unit <3> API multiplexing The content of the Python standard library <4 > Collaborative sockets<5> with SSL enables DNS querying via thread pool or c-ares <6> monkey patching function to turn third-party modules into collaborative gevent.spawn () The method spawn some jobs and then joins jobs to the micro-thread execution queue by Gevent.joinall to wait for it to complete, setting the timeout to 2 seconds. The results after execution are checked by gevent. Greenlet.value values to collect. =========================== 21, about the epoll mechanism of Linux: Epoll is the Linux kernel for processing large batch file descriptor and improved poll, is the Linux under multiplexed IO interface select/ The enhanced version of poll, which significantly increases the system CPU utilization of the program in the presence of a small number of active concurrent connections. Epoll Advantages: (1) support a process to open a large number of socket descriptors. The FD opened by a process of select is qualified by the Fd_setsize setting, and Epoll does not have this limit, and it supports the maximum number of open files, much greater than 2048. (2) IO efficiency does not decrease linearly with the number of FD: Since Epoll only operates on the "active" socket, only the "active" socket will be active to invoke the callback function, and the other idle sockets will not. (3) Use Mmap to speed up the kernel and user space message delivery. Epoll is implemented by the kernel in user space mmap the same piece of memory. (4) Kernel fine tuning. 2. The Libev mechanism provides a mechanism for invoking a callback function when a file descriptor event occurs. Libev is an event circulator that registers an event of interest to Libev, such as a socket-readable event, and Libev manages the source of the registered event and triggers the corresponding program when the event occurs. =========================== three ' import gevent from gevent Import Socket urls = [' www.google.com.hk ', ' www.example.com ', ' www.python.org '] jobs = [Gevent.spawn (Socke T.gethostbyname, url) for URL in URLs] gevent.joinall (jobs, timeout=2) [Job.value for job in jobs][' 74.125.128.199 ', ' 208.77.188.166 ', ' 82.94.164.162 ', ' Gevent.spawn () method spawn some jobs and then adds jobs to the micro line via Gevent.joinall The process execution queue waits for it to complete, setting the timeout to 2 seconds. The results after execution are checked by gevent. Greenlet.value values to collect. The Gevent.socket.gethostbyname () function has the same interface as the standard Socket.gethotbyname (), but it does not block the entire interpreter, so that other greenlets can be executed following an unimpeded request. The operating environment of the Monket Patchingpython allows us to modify most objects at runtime, including modules, classes, and even functions. While this creates an "implicit side effect" and the problem is difficult to debug, Monkey patching comes in handy when you need to modify the underlying behavior of Python itself. Monkey patching enables gevent to modify most of the blocking system calls in the standard library, including modules such as socket,ssl,threading and select, and become collaborative. from gevent import monkey; monkey. Patch_socket () Import urllib2 through the Monkey.patch_socket () method, the Urllib2 module can be used in a multi-micro threading environment to achieve the purpose of working with Gevent. Event loops are not like other network libraries, like Gevent and Eventlet, which implicitly start an event loop in a Greenlet. There is no need to call the reactor (reactor) of run () or dispatch (), which is reactor in twisted. When Gevent's API function wants to block, it gets the hub real(Greenlet of the Execution time loop), and switch the past. If there are no hub instances, they are created dynamically. The event loop provided by Libev uses the system's fastest polling mechanism by default, setting the LIBEV_FLAGS environment variable to specify the polling mechanism. Libev_flags=1 is Select, Libev_flags = 2 is poll, Libev_flags = 4 is epoll,libev_flags = 8 is kqueue. The Libev API is located under Gevent.core. Note that the callback for the Libev API runs on the hub's Greenlet, so use the synchronous Greenlet API. You can use asynchronous APIs such as Spawn () and Event.set ().

Python Development Foundation---interprocess communication, process pooling, and co-process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.