Python Learning Note 18: Standard library multi-process (multiprocessing package)

Source: Internet
Author: User
Tags message queue mul semaphore

We can use subprocess packages to create sub-processes, but there are two big limitations to this package:
1) We always let Subprocess run external programs instead of running a function written inside a python script.
2) The process of text communication is only through pipelines.
The above limits our use of subprocess packages to a wider range of multi-process tasks.
This comparison is actually unfair, because subprocessing itself is designed to be a shell, not a multi-process Management Pack.

A threading and Multiprocessingmultiprocessing package is a multi-process Management Pack in Python.
With threading. Like thread, it can take advantage of multiprocessing. Processes object to create a process.
The process can run functions written inside the Python program.
The process object is used in the same way as the thread object, and there is a method for start (), run (), join ().
There are also lock/event/semaphore/condition classes in the multiprocessing package (these objects can be passed to each process as multi-threaded, through parameters) to synchronize the process, using the same class as the threading package.
So, a large part of multiprocessing and threading use the same set of APIs, only to a multi-process situation.


But when using these shared APIs, here are a few things to keep in mind:
1) on Unix platforms, when a process is terminated, the process needs to be called by its parent process wait, or the process becomes a zombie process (Zombie).
Therefore, it is necessary to call the join () method on each Process object (which is actually equivalent to wait). For multithreading, this is not necessary because there is only one process.
2) multiprocessing provides an IPC (such as pipe and queue) that is not available in the threading package, and is more efficient.
Pipe and queue should be given priority, and synchronization methods such as lock/event/semaphore/condition should be avoided (since they occupy no resources of the user process).
3) Multiple processes should avoid sharing resources. In multi-threading, we can share resources more easily, such as using global variables or passing parameters.
In a multi-process scenario, the above method is not appropriate because each process has its own independent memory space. At this point we can share the resources by sharing the memory and the manager's method.
However, this improves the complexity of the program and reduces the efficiency of the program because of the need for synchronization. The PID is saved in the Process.pid and the PID is none if the process does not have start ().


We can see from the following program that the thread object and the process object differ in similarity and results on use. Each thread and process does one thing: print the PID.
The problem is that all tasks are output to the same standard output (stdout) when printed. The characters that are output are mixed together and cannot be read.
With lock synchronization, after one task output is completed, another task output is allowed, which prevents multiple tasks from being output to the terminal at the same time.


# Similarity and difference of multi thread vs. Multi process import Osimport Threadingimport multiprocessing # Worker Fun Ctiondef worker (sign, lock):    lock.acquire ()    print (sign, os.getpid ())    lock.release () # Mainprint (' Main: ' , Os.getpid ()) # Multi-threadrecord = []lock  = Threading. Lock () for I in range (5):    thread = Threading. Thread (target=worker,args= (' thread ', lock))    Thread.Start ()    record.append (thread) for the thread in record:    Thread.Join () # Multi-processrecord = []lock = multiprocessing. Lock () for I in range (5):    process = multiprocessing. Process (target=worker,args= (' process ', lock))    Process.Start ()    record.append (process) for process in record :    Process.join ()

All thread pid is the same as the main program, and each process has a different PID.

Two pipe and queue pipe pipes and message queue messages in the Queue,multiprocessing package have the pipe class and the queue class to support each of these IPC mechanisms separately. Pipe and queue can be used to transfer common objects.


1) pipe can be unidirectional (Half-duplex) or bidirectional (duplex).
We passed the mutiprocessing. Pipe (Duplex=false) creates a one-way pipe (default is bidirectional).
A process enters an object from the pipe end and is then received by the process on the other end of the pipe, and the one-way pipeline allows only the process input at the end of the pipe, while the bidirectional pipe allows input from both ends.
The following program shows the use of pipe:

# multiprocessing with pipe import multiprocessing as Mul def proc1 (Pipe):    pipe.send (' hello ')    print (' Proc1 rec: ') , Pipe.recv ()) def proc2 (pipe):    print (' proc2 rec: ', PIPE.RECV ())    pipe.send (' Hello, too ') # Build a pipepipe = Mul. Pipe () # Pass an end of the pipe to process 1p1   = mul. Process (Target=proc1, args= (pipe[0),) # Pass The other end of the pipe to process 2p2   = mul. Process (TARGET=PROC2, args= (Pipe[1],)) P1.start () P2.start () P1.join () P2.join ()

The pipe here is two-way.
When the pipe object is built, it returns a table with two elements, each representing one end of the pipe (the connection object).
We call the Send () method on one end of the pipe to transfer the object and receive it using recv () at the other end.


2) The queue is similar to the pipe and is a first-in-one-out structure. But the queue allows multiple processes to be put in, and multiple processes are fetching objects from the queue.
The queue is created using Mutiprocessing.queue (MAXSIZE), which maxsize represents the maximum number of objects that can be held in a queue.
The following program shows the use of the queue:
Import osimport multiprocessingimport time#==================# input workerdef inputq (queue): info = str (os.getpid ()) + ' (Put): ' + str (time.time ()) Queue.put (info) # output Workerdef outputq (queue,lock): info = queue.get () lock.acqu IRE () print (str (os.getpid ()) + ' (GET): ' + info ' lock.release () #===================# Mainrecord1 = [] # store INPU T processesrecord2 = [] # store Output Processeslock = multiprocessing. Lock () # to prevent messy printqueue = multiprocessing. Queue (3) # input Processesfor I in range: Process = multiprocessing.    Process (target=inputq,args= (queue,)) Process.Start () record1.append (process) # output Processesfor I in range (10): Process = multiprocessing. Process (target=outputq,args= (Queue,lock)) Process.Start () record2.append (process) for P in Record1:p.join () Queu E.close () # No More object would come, close the queue for P in Record2:p.join ()

Some processes use put () to put a string in a queue that contains PID and time.
Other processes are removed from the queue and print their own PID and get () strings.


The three-process pool process pool can create multiple processes. These processes are like soldiers on standby, ready to perform tasks (procedures). A process pool can accommodate multiple soldiers on standby.
For example, the following program:
Import multiprocessing as Mul def f (x):    return x**2 pool = mul. Pool (5) Rel  = Pool.map (f,[1,2,3,4,5,6,7,8,9,10]) print (REL)

We have created a process pool that allows 5 processes. Each process running by the pool executes the f () function.
We use the map () method to effect the F () function on each element of the table. This is similar to built-in's map () function, except that it is handled in parallel with 5 processes.
If there are elements that need to be processed after the process has finished running, then the process is used to rerun the F () function. In addition to the map () method, the pool has the following common methods.
1) Apply_async (Func,args) takes a process from the process pool to execute the parameter Func,args to Func.
It returns an AsyncResult object that you can call the Get () method to get the result.
2) The close () process pool no longer creates a new process
3) Join () Wait for all processes in the process pool. You must first call the close () method on the pool to join.


Four shared resources multi-process shared resources will inevitably lead to inter-process competition. And this competition will cause race condition, our results may be affected by the uncertainty of competition.
But if you want, we can still do this through shared memory and manager objects.


1 shared memory based on the principle of shared memory, here are examples of Python implementations:
Import multiprocessing def f (N, a):    n.value   = 3.14    a[0]      = 5 num   = multiprocessing. Value (' d ', 0.0) arr   = multiprocessing. Array (' I ', Range (ten)) P = multiprocessing. Process (target=f, args= (num, arr)) P.start () p.join () print num.valueprint arr[:]

Here we actually only have the process represented by the main and process objects.
We create shared memory in the memory space of the main process, which is the value and array two objects. The object value is set to the double-precision number (d) and initialized to 0.0.
The array is similar to the arrays in C, with a fixed type (I, which is an integer). In the process, we modified the value and the array object.
Back to the main program, print out the results, the main program also saw a change of two objects, indicating that the resources are indeed shared between two processes.

The 2 Managermanager object is similar to the communication between the server and the customer (server-client), similar to our activity on the Internet.
We use a process as a server to build the manager to really store the resources. Other processes can be passed through parameters or access the manager based on the address, and after establishing the connection, manipulate the resources on the server.
With the firewall permitting, we can fully apply the manager to multiple computers, mimicking a real-world network scenario.
In the following example, our use of the manager is similar to shared memory, but we can share a richer object type.
Import multiprocessing def f (x, arr, L):    x.value = 3.14    arr[0] = 5    l.append (' Hello ') server = multiprocessing. Manager () x    = server. Value (' d ', 0.0) arr  = server. Array (' I ', Range (Ten)) L    = server.list () proc = multiprocessing. Process (Target=f, args= (x, arr, L)) Proc.start () Proc.join () print (x.value) print (arr) print (L)

The manager uses the list () method to provide a way to share tables.
You can actually use Dict () to share the dictionary, Lock () to share the threading. Lock (Note that we are sharing the threading. Lock, not the mutiprocessing of the process. Lock. The latter itself has realized process sharing) and so on.
This allows the manager to allow us to share more diverse objects.

Python Learning Note 18: Standard library multi-process (multiprocessing package)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.