Python study note 18: multiprocessing package)
We can use the subprocess package to create sub-processes, but this package has two major limitations:
1) We always allow subprocess to run external programs, rather than running a function written inside a Python script.
2) Process Communication is conducted only through pipelines.
The above limits the application of the subprocess package to a wider range of multi-process tasks.
This is unfair because subprocessing is designed as a shell instead of a multi-process management package.
A threading and multiprocessingmultiprocessing package is a multi-process management package in Python.
Similar to threading. Thread, it can use the multiprocessing. Process object to create a Process.
This process can run functions written inside the Python program.
This Process object has the same usage as the Thread object, and also has the start (), run (), join () methods.
In addition, the multiprocessing package also contains the Lock/Event/Semaphore/Condition class (these objects can be passed to various processes through parameters like multithreading) for process synchronization, its usage is the same as that in the threading package.
Therefore, a large part of multiprocessing uses the same set of Apis as threading, but changes to the multi-process situation.
However, when using these shared APIs, pay attention to the following points:
1) on UNIX platforms, when a process ends, the process needs to be called by its parent process (wait); otherwise, the process becomes a Zombie process (Zombie ).
Therefore, it is necessary to call the join () method (actually equivalent to wait) for each Process object ). For multithreading, this necessity does not exist because there is only one process.
2) multiprocessing provides IPCS (such as Pipe and Queue) not in the threading package, which is more efficient.
Pipe and Queue should be given priority to avoid synchronization methods such as Lock/Event/Semaphore/Condition (because they occupy resources of user processes ).
3) multi-process should avoid sharing resources. In multithreading, we can easily share resources, such as using global variables or passing parameters.
In the case of multiple processes, the above method is not suitable because each process has its own memory space. In this case, we can share resources by sharing memory and Manager.
However, this improves the complexity of the program and reduces the efficiency of the program due to synchronization needs. Process. PID contains a PID. If the Process does not have start (), the PID is None.
We can see from the program below that the similarity and result of the Thread object and the Process object are different. Each thread and process does one thing: print the PID.
However, the problem is that all tasks are output to the same stdout during printing. In this way, the output characters are mixed together and cannot be read.
Use Lock synchronization to allow the output of another task after the output of one task is completed. This prevents multiple tasks from being output to the terminal at the same time.
# Similarity and difference of multi thread vs. multi process import osimport threadingimport multiprocessing # worker functiondef worker(sign, lock): lock.acquire() print(sign, os.getpid()) lock.release() # Mainprint('Main:',os.getpid()) # Multi-threadrecord = []lock = threading.Lock()for i in range(5): thread = threading.Thread(target=worker,args=('thread',lock)) thread.start() record.append(thread) for thread in record: thread.join() # Multi-processrecord = []lock = multiprocessing.Lock()for i in range(5): process = multiprocessing.Process(target=worker,args=('process',lock)) process.start() record.append(process) for process in record: process.join()
The PID of all threads is the same as that of the main program, and each Process has a different PID.
Pipe and message Queue in two PIPE and queue pipelines. The multiprocessing packages include Pipe and Queue classes to support these two IPC Mechanisms respectively. Pipe and Queue can be used to transmit common objects.
1) Pipe can be one-way (half-duplex) or two-way (duplex ).
We use mutiprocessing. Pipe (duplex = False) to create a one-way pipeline (bidirectional by default ).
A process inputs an object from one end of PIPE and receives it by the process at the other end of PIPE. One-way pipelines only allow process input at one end of the pipeline, while two-way pipelines allow input at both ends.
The following program demonstrates the use of Pipe:
# Multiprocessing with Pipe import multiprocessing as mul def proc1(pipe): pipe.send('hello') print('proc1 rec:',pipe.recv()) def proc2(pipe): print('proc2 rec:',pipe.recv()) pipe.send('hello, too') # Build a pipepipe = mul.Pipe() # Pass an end of the pipe to process 1p1 = mul.Process(target=proc1, args=(pipe[0],))# Pass the other end of the pipe to process 2p2 = mul.Process(target=proc2, args=(pipe[1],))p1.start()p2.start()p1.join()p2.join()
Pipe here is bidirectional.
When a Pipe object is created, a table containing two elements is returned. Each element represents one end of Pipe (Connection object ).
We call the send () method on one end of Pipe to send objects, and use recv () to receive objects on the other end.
2) the Queue is similar to Pipe, and is a first-in-first-out structure. However, the Queue allows multiple processes to be put in, and multiple processes to retrieve objects from the Queue.
Queue is created using mutiprocessing. Queue (maxsize). maxsize indicates the maximum number of objects that can be stored in the Queue.
The following program shows the use of Queue:
import osimport multiprocessingimport time#==================# input workerdef inputQ(queue): info = str(os.getpid()) + '(put):' + str(time.time()) queue.put(info) # output workerdef outputQ(queue,lock): info = queue.get() lock.acquire() print (str(os.getpid()) + '(get):' + info) lock.release()#===================# Mainrecord1 = [] # store input processesrecord2 = [] # store output processeslock = multiprocessing.Lock() # To prevent messy printqueue = multiprocessing.Queue(3) # input processesfor i in range(10): process = multiprocessing.Process(target=inputQ,args=(queue,)) process.start() record1.append(process) # output processesfor i in range(10): process = multiprocessing.Process(target=outputQ,args=(queue,lock)) process.start() record2.append(process) for p in record1: p.join() queue.close() # No more object will come, close the queue for p in record2: p.join()
Some processes use put () to put a string in the Queue, which contains the PID and time.
Other processes extract from the Queue and print their own PID and get () strings.
You can create multiple processes in a three-Process Pool. These processes are like standing soldiers preparing to execute tasks (programs ). A process pool can accommodate multiple on-standby soldiers.
For example, the following program:
import multiprocessing as mul def f(x): return x**2 pool = mul.Pool(5)rel = pool.map(f,[1,2,3,4,5,6,7,8,9,10])print(rel)
We created a Process Pool that allows five processes ). Every process running in the Pool executes the f () function.
We use the map () method to apply the f () function to each element of the table. This is similar to the map () function of built-in, but five processes are used for parallel processing.
If an element still needs to be processed after the process finishes running, the process will be used to re-run the f () function. In addition to the map () method, the Pool also has the following common methods.
1) apply_async (func, args) extracts a process from the process pool to execute func, and args is the func parameter.
It returns an AsyncResult object. You can call the get () method on this object to obtain the result.
2) close () process pool does not create new processes
3) join () all processes in the wait process pool. You must call the close () method for the Pool to join.
Four shared resources multi-process sharing resources will inevitably lead to competition between processes. This competition will cause race condition, and our results may be affected by the uncertainty of competition.
But if necessary, we can still do this through the shared memory and Manager object.
1. Based on the principle of shared memory, the following example is provided:
import multiprocessing def f(n, a): n.value = 3.14 a[0] = 5 num = multiprocessing.Value('d', 0.0)arr = multiprocessing.Array('i', range(10)) p = multiprocessing.Process(target=f, args=(num, arr))p.start()p.join() print num.valueprint arr[:]
Here, we only have the Process represented by the main Process and the Process object.
We create shared memory in the memory space of the main process, that is, two objects, Value and Array. The object Value is set to double precision (d) and initialized to 0.0.
Array is similar to an Array in C and has a fixed type (I, that is, an integer ). In the Process, we modified the Value and Array objects.
Return to the main program and print out the results. The main program also saw the changes to the two objects, indicating that the resources are indeed shared between the two processes.
2 The ManagerManager object is similar to the server-client communication between the server and the customer. It is similar to our activities on the Internet.
We use a process as the server and create a Manager to truly store resources. Other processes can access the Manager by passing parameters or by address. After a connection is established, the resources on the server can be operated.
As permitted by the firewall, we can apply the Manager to multiple computers, thus imitating a real network situation.
In the following example, the use of Manager is similar to shared memory, but more object types can be shared.
import multiprocessing def f(x, arr, l): x.value = 3.14 arr[0] = 5 l.append('Hello') server = multiprocessing.Manager()x = server.Value('d', 0.0)arr = server.Array('i', range(10))l = server.list() proc = multiprocessing.Process(target=f, args=(x, arr, l))proc.start()proc.join() print(x.value)print(arr)print(l)
Manager uses the list () method to share tables.
In fact, you can use dict () to share the dictionary and Lock () to share threading. Lock (note that we share threading. Lock instead of mutiprocessing. Lock of the process. The latter itself has implemented process sharing.
In this way, the Manager allows us to share more diverse objects.