Python's multiprocessing module is used to create multiple processes, and the following summarizes the usage records for multiprocessing.
Series Articles
Python Concurrent programming Threading thread (i)
Multiprocessing process of Python concurrent programming (ii)
Asyncio of Python concurrent Programming (iii)
Gevent of Python concurrent Programming (iv)
Python concurrent programming Queue threads, processes, and co-process communication (v)
Scheduling principle of process, thread, and association of Python Concurrent Programming (vi)
Python concurrent programming multiprocessing process Comparison of Windows and Linux environments (vii)
Fork ()
import ospid = os.fork() # 创建一个子进程if pid == 0: print('这是子进程') print(os.getpid(),os.getppid())else: print('这是父进程') print(os.getpid())os.wait() # 等待子进程结束释放资源
When the fork function is called, it returns two times, the PID 0 represents the child process, and the other returns the child process ID number representing the parent process.
The Getpid and GETPPID functions can obtain the ID number of this process and the parent process;
Disadvantages of Fork Mode:
- Poor compatibility, can only be used under the Class Linux system, Windows system is not available;
- Poor scalability, process management becomes complex when multiple processes are required;
- "Orphan" and "zombie" processes are generated, and resources need to be recycled manually.
Advantages:
Is the system's own approach to the low-level creation, operating efficiency.
Process creation Processes
from multiprocessing import Queue, Processimport osdef test(): time.sleep(2) print('this is process {}'.format(os.getpid()))if __name__ == '__main__': p = Process(target=test) p.start() # 子进程 开始执行 p.join() # 等待子进程结束 print('ths peocess is ended')
from multiprocessing import Queue, Processimport osclass MyProcess(Process): def run(self): time.sleep(2) print('this is process {}'.format(os.getpid())) def __del__(self): print('del the process {}'.format(os.getpid()))if __name__ == '__main__': p = MyProcess() p.start() print('ths process is ended')# 结果:ths process is endedthis is process 7600del the process 7600del the process 12304
Description
Process objects can create processes, but process objects are not processes, and their deletion is not directly related to whether system resources are recycled.
The previous example saw that the Del method was called two times, and the process process was created, and the subprocess completely copied the process object of the main session, so that there was a process object in both the main process and the child process, but P1.start () started the child process. The process object in the main process exists as a static object.
When the master process finishes executing, it waits for the child process to recycle the resources by default, and does not need to recycle the resources manually;
The join () function is used to control the order in which the child processes end, the main process blocks waiting for the child process to end, and within it there is a function to clear the zombie process, which can reclaim resources;
When the child process is finished, a zombie process is generated, it is reclaimed by the join function, or another process is turned on, and the start function recycles the zombie process, so it is not necessary to write the join function.
The Windows system automatically clears the process object of the child process as soon as the child process ends, and the process object of the Linux system subprocess, if there is no Join function and start function, is cleared after the main process ends.
Process Object Analysis
class Process(object): def __init__(self, group=None, target=None, name=None, args=(), kwargs={}): pass# Process对象是python用来创建进程的类group:扩展保留字段;target:目标代码,一般是我们需要创建进程执行的目标函数。name:进程的名字,如果不指定会自动分配一个;args:目标函数的普通参数;kwargs:目标函数的键值对参数;# 方法start():创建一个子进程并执行,该方法一个Process实例只能执行一次,其会创建一个进程执行该类的run方法。run():子进程需要执行的代码;join():主进程阻塞等待子进程直到子进程结束才继续执行,可以设置等待超时时间timeout.terminate():使活着的进程终止;is_alive():判断子进程是否还活着。
Process Pools Pool
If you need to create a large number of processes, you need to use the pool.
from multiprocessing import Queue, Process, Poolimport osdef test(): time.sleep(2) print('this is process {}'.format(os.getpid()))def get_pool(n=5): p = Pool(n) # 设置进程池的大小 for i in range(10): p.apply_async(test) p.close() # 关闭进程池 p.join()if __name__ == '__main__': get_pool() print('ths process is ended')
Analysis:
As above, after the process pool has been created, even though the number of processes actually need to be created is much larger than the maximum process pool, the P1.apply_async (test) code will continue to execute and will not stop waiting, which is equivalent to submitting 10 requests to the process pool and will be placed in a queue;
After executing p1 = Pool (5) This code, 5 processes have been created, but have not assigned their own tasks, that is, no matter how many tasks, the actual number of processes only 5, the computer each time up to 5 processes parallel.
When a process task in the pool is completed, the process resource is freed, and the pool takes out a new request to the idle process to continue with the FIFO principle;
When all the pool process tasks are completed, there will be 5 zombie processes, if the main thread does not end, the system will not automatically reclaim the resources, need to call the Join function to recycle.
The join function is the main process waiting for the end of the child process to reclaim the system resources, if there is no join, the main program exits without the pipe process has not ended will be forced to kill;
When creating pool pools, if you do not specify the maximum number of processes, the number of processes created by default is the number of cores of the system.
Pool Object Analysis
class Pool(object): def __init__(self, processes=None, initializer=None, initargs=(), maxtasksperchild=None, context=None): pass# 初始化参数processes:进程池的大小,默认cpu内核的数量initializer:创建进程执行的目标函数,其会按照进程池的大小创建相应个数的进程;initargs:目标函数的参数context:代码的上下文# 方法apply():使用阻塞方式调用func;apply_async():使用非阻塞方式条用func;close():关闭Pool,使其不再接受新的任务;terminate():不管任务是否完成,立即终止;join():主进程阻塞,等待子进程的退出,必须在close()后面使用;map(self, func, iterable, chunksize=None):多进程执行一个函数,传入不同的参数;starmap(self, func, iterable, chunksize=None):和map类似,但iterable参数可解压缩;starmap_async(self, func, iterable, chunksize=None, callback=None,error_callback=None):使用异步的方式的starmap,callback为返回后的处理函数map_async(self, func, iterable, chunksize=None, callback=None,error_callback=None):异步方式的map
from multiprocessing import Poolimport osdef test(n): time.sleep(1) print('this is process {}'.format(os.getpid())) return ndef test1(n, m): print(n, m) print('this is process {}'.format(os.getpid()))def back_func(values): # 多进程执行完毕会返回所有的结果的列表 print(values)def back_func_err(values): # 多进程执行完毕会返回所有错误的列表 print(values)def get_pool(n=5): p = Pool(n) # p.map(test, (i for i in range(10))) # 阻塞式多进程执行 # p.starmap(test1, zip([1,2,3],[3,4,5])) # 阻塞式多进程执行多参数函数 # 异步多进程执行函数 p.map_async(test, (i for i in range(5)), callback=back_func, error_callback=back_func_err) # 异步多进程执行多参数函数 p.starmap_async(test1, zip([1,2,3],[3,4,5]), callback=back_func, error_callback=back_func_err) print('-----') p.close() p.join()if __name__ == '__main__': get_pool() print('ths process is ended')
Process Lock
Although the process does not share memory as the thread does, it has separate memory for each process, but multiple processes are also shared with the file system, the hard disk system, which can cause data corruption when multiple processes are simultaneously writing to file operations, so there is a synchronization lock for the process.
from multiprocessing import Pool, Lockmuex = Lock()def test(): if muex.acquire(): f = open('./test_pro.txt', 'r+', encoding='utf-8') x = f.read() if not x: f.write('0') else: f.seek(0) f.write(str(int(x)+1)) f.close() muex.release()if __name__ == '__main__': p = Pool(5) for i in range(10): p.apply_async(test) p.close() p.join() with open('./test_pro.txt', 'r+', encoding='utf-8') as f: print(f.read())
The process lock can ensure the security of the file system, but it makes the parallel become serial, the efficiency is reduced, also may cause the deadlock problem, generally avoids the locking mechanism.
Reference
docs.python.org/