We were able to create child processes using the subprocess packet. But this package has two very large limitations:
1) We always let subprocess execute external programs instead of executing a function written inside a python script.
2) The process of communicating text only through pipelines.
The above limits our use of subprocess packages to a wider range of multi-process tasks.
This comparison is actually unfair, since subprocessing itself is designed to be a shell rather than a multi-process Management Pack.
A threading and Multiprocessingmultiprocessing package is a multi-process Management Pack in Python.
With threading. Thread is similar. It can take advantage of multiprocessing. Processes object to create a process.
The process is capable of executing functions written inside the Python program.
The process object is used in the same way as the thread object, and there is a method for start (), run (), join ().
There are also lock/event/semaphore/condition classes in the multiprocessing package (these objects can be passed to each process in the same way as multithreading) to synchronize the process. It is used in the same way as a class with the same name in the threading package.
So. A very large part of multiprocessing and threading use the same set of APIs, just in the context of a multi-process.
But when using these shared APIs, here are a few things to keep in mind:
1) on Unix platforms, when a process is terminated. The process needs to be called by its parent process to wait, or the process becomes a zombie process (Zombie).
So. It is necessary to call the join () method on each of the process objects (which is actually equivalent to wait). For multithreading, because there is only one process. So there is no need for this.
2) multiprocessing provides an IPC (such as pipe and queue) that is not available in the threading package, and is more efficient.
Pipe and queue should be prioritized, avoiding synchronization methods such as Lock/event/semaphore/condition (because they are not occupying resources for user processes).
3) Multiple processes should avoid sharing resources.
In multi-threading, we can easily share resources, for example by using global variables or passing parameters.
In multi-process situations. Because each process has its own independent memory space. The above method is not appropriate. At this point we are able to share resources through shared memory and manager methods.
However, this improves the complexity of the program and reduces the efficiency of the program due to the need for synchronization. The PID is saved in the process.pid, assuming that the process has not yet start (). The PID is none.
We can see how the thread object and the process object differ in the similarity and result of the usage from the following program. Each thread and process does one thing: print the PID.
The problem is that all tasks are printed to the same standard output (stdout) output. The characters that are output are mixed together and cannot be read.
With lock synchronization, after a task output is complete, it is agreed that there is also a task output that avoids multiple tasks being output to the terminal at the same time.
# Similarity and difference of multi thread vs. Multi process import Osimport Threadingimport multiprocessing # Worker Fun Ctiondef worker (sign, lock): lock.acquire () print (sign, os.getpid ()) lock.release () # Mainprint (' Main: ' , Os.getpid ()) # Multi-threadrecord = []lock = Threading. Lock () for I in range (5): thread = Threading. Thread (target=worker,args= (' thread ', lock)) Thread.Start () record.append (thread) for the thread in record: Thread.Join () # Multi-processrecord = []lock = multiprocessing. Lock () for I in range (5): process = multiprocessing. Process (target=worker,args= (' process ', lock)) Process.Start () record.append (process) for process in record : Process.join ()
All thread pid is the same as the main program, and each process has a different PID.
Two pipe and queue pipe pipes and message queue messages in the Queue,multiprocessing package have the pipe class and the queue class to support each of these IPC mechanisms separately. Pipe and queue can be used to transfer common objects.
1) pipe can be unidirectional (Half-duplex). can also be bidirectional (duplex).
We passed the mutiprocessing. Pipe (Duplex=false) creates a one-way pipe (default feeling bidirectional).
A process enters an object from the pipe end and is then received by a process with one end of the pipe, and the unidirectional pipe simply agrees to the process input at the end of the pipe, while the bidirectional pipe agrees to enter it from both ends.
The following procedure shows the use of pipe:
# multiprocessing with pipe import multiprocessing as Mul def proc1 (Pipe): pipe.send (' hello ') print (' Proc1 rec: ') , Pipe.recv ()) def proc2 (pipe): print (' proc2 rec: ', PIPE.RECV ()) pipe.send (' Hello, too ') # Build a pipepipe = Mul. Pipe () # Pass an end of the pipe to process 1p1 = mul. Process (Target=proc1, args= (pipe[0),) # Pass The other end of the pipe to process 2p2 = mul. Process (TARGET=PROC2, args= (Pipe[1],)) P1.start () P2.start () P1.join () P2.join ()
The pipe here is two-way.
When the pipe object is created, it returns a table with two elements. Each element represents one end of the pipe (the connection object).
We call the Send () method on one end of the pipe to transfer the object, and the other end uses recv () to receive it.
2) The queue is similar to the pipe and is a first-in-one-out structure. But the queue agrees that multiple processes are put in, and multiple processes are fetching objects from the queue.
Queue is created using Mutiprocessing.queue (maxsize). MaxSize represents the maximum number of objects that can be held in a queue.
The following program shows the use of the queue:
Import osimport multiprocessingimport time#==================# input workerdef inputq (queue): info = str (os.getpid ()) + ' (Put): ' + str (time.time ()) Queue.put (info) # output Workerdef outputq (queue,lock): info = queue.get () lock.acquire () print (str (os.getpid ()) + ' (GET): ' + info ' lock.release () #===================# Mainrecord1 = [] # Store input processesrecord2 = [] # store output Processeslock = multiprocessing. Lock () # to prevent messy printqueue = multiprocessing. Queue (3) # input Processesfor i in range: process = multiprocessing. Process (target=inputq,args= (queue,)) Process.Start () record1.append (process) # output Processesfor I in Range: process = multiprocessing. Process (target=outputq,args= (queue,lock)) Process.Start () record2.append (process) for P in Record1: P.join () queue.close () # No More object would come, close the queue for P in Record2: p.join ()
Some processes use put () to put strings in a queue, including PID and time.
There are also processes that are removed from the queue and print their own PID and get () strings.
The three-process pool process pool can create multiple processes. These processes are like soldiers on standby, ready to run tasks (Programs).
A process pool can accommodate multiple soldiers on standby.
For example, the following program:
Import multiprocessing as Mul def f (x): return x**2 pool = mul. Pool (5) Rel = Pool.map (f,[1,2,3,4,5,6,7,8,9,10]) print (REL)
We have created a process pool that allows 5 processes. Each process executed by the pool executes the f () function.
We use the map () method to effect the F () function on each element of the table. This is similar to the built-in map () function. This is just a process that is handled in parallel with 5 processes.
Assuming that the process has finished executing, there are elements that need to be processed. Then the process will be used to execute the F () function again. Except for the map () method. The pool also has the following regular usage.
1) Apply_async (Func,args) removes a process from the process pool to run Func. Args is a parameter of func.
It will return an AsyncResult object. You can call the Get () method on the object to get the result.
2) The close () process pool no longer creates a new process
3) Join () Wait for all processes in the process pool.
The close () method talent join must be called on the pool first.
Four shared resources multi-process shared resources are bound to lead to competition between processes. And such competition will cause race condition, our results may be affected by the uncertainty of competition.
But assuming we do, we can still do this through shared memory and manager objects.
1 shared memory based on the principle of shared memory, here are examples of Python implementations:
Import multiprocessing def f (N, a): n.value = 3.14 a[0] = 5 num = multiprocessing. Value (' d ', 0.0) arr = multiprocessing. Array (' I ', Range (ten)) P = multiprocessing. Process (target=f, args= (num, arr)) P.start () p.join () print num.valueprint arr[:]
Here we actually only have the process represented by the main and process objects.
We create shared memory in the memory space of the main process, which is the value and array two objects. The object value is set to the double-precision number (d) and initialized to 0.0.
The array is similar to the arrays in C, with a fixed type (I, which is an integer). In the process, we changed the value and the array object.
Back to the main program. Print out the results, the main program also saw the changes of two objects, indicating that the resources are indeed shared between two processes.
The 2 Managermanager object is similar to the server-to-customer communication (server-client), which is very similar to our activity on the Internet.
We use a process as the server to build the manager to really store the resources. Other processes can be passed through the number of passes or access the manager based on the address, after establishing the connection. Manipulate resources on the server.
With the consent of the firewall, we are fully able to apply the manager to multiple computers. Thus imitating a real network situation.
In the following example, our use of the manager is similar to shared memory. However, you can share a richer object type.
Import multiprocessing def f (x, arr, L): x.value = 3.14 arr[0] = 5 l.append (' Hello ') server = multiprocessing. Manager () x = server. Value (' d ', 0.0) arr = server. Array (' I ', Range (Ten)) L = server.list () proc = multiprocessing. Process (Target=f, args= (x, arr, L)) Proc.start () Proc.join () print (x.value) print (arr) print (L)
The manager uses the list () method to provide a way to share tables.
You can actually use Dict () to share the dictionary, Lock () to share the threading. Lock (Note that we are sharing the threading. Lock. Rather than the mutiprocessing of the process. Lock.
The latter itself has realized process sharing) and so on.
This way the manager agrees that we share many other objects.
Python Learning Note 18: Standard library multi-process (multiprocessing package)