Python multi-process usage summary, python Process summary
Multi-Process in python is mainly usedMultiprocessingThis library. This library may cause problems when using multiprocessing. Manager (). Queue. We recommend that you upgrade python to a later version, for example, 2.7.11. For details, refer to "python version upgrade".
For details about how to use a thread pool in python, refer to python thread pool implementation.
I. multi-process usage
1. The fork function can be used in linux.
#!/bin/env pythonimport osprint 'Process (%s) start...' % os.getpid()pid = os.fork()if pid==0: print 'I am child process (%s) and my parent is %s.' % (os.getpid(), os.getppid()) os._exit(1)else: print 'I (%s) just created a child process (%s).' % (os.getpid(), pid)
Output
Process (22246) start...I (22246) just created a child process (22247).I am child process (22247) and my parent is 22246.
2. Use multiprocessing
#!/bin/env pythonfrom multiprocessing import Processimport osimport timedef run_proc(name): time.sleep(3) print 'Run child process %s (%s)...' % (name, os.getpid())if __name__=='__main__': print 'Parent process %s.' % os.getpid() processes = list() for i in range(5): p = Process(target=run_proc, args=('test',)) print 'Process will start.' p.start() processes.append(p) for p in processes: p.join() print 'Process end.'
Output
Parent process 38140.Process will start.Process will start.Process will start.Process will start.Process will start.Run child process test (38141)...Run child process test (38142)...Run child process test (38143)...Run child process test (38145)...Run child process test (38144)...Process end.real 0m3.028suser 0m0.021ssys 0m0.004s
2. Process pool
1. Use multiprocessing. Pool for non-blocking
#!/bin/env pythonimport multiprocessingimport timedef func(msg): print "msg:", msg time.sleep(3) print "end"if __name__ == "__main__": pool = multiprocessing.Pool(processes = 3) for i in xrange(3): msg = "hello %d" %(i) pool.apply_async(func, (msg, )) print "Mark~ Mark~ Mark~~~~~~~~~~~~~~~~~~~~~~" pool.close() pool.join() # behind close() or terminate() print "Sub-process(es) done."
Running result
Mark~ Mark~ Mark~~~~~~~~~~~~~~~~~~~~~~msg: hello 0msg: hello 1msg: hello 2endendendSub-process(es) done.real 0m3.493suser 0m0.056ssys 0m0.022s
2. Use multiprocessing. Pool to block the version
#!/bin/env pythonimport multiprocessingimport timedef func(msg): print "msg:", msg time.sleep(3) print "end"if __name__ == "__main__": pool = multiprocessing.Pool(processes = 3) for i in xrange(3): msg = "hello %d" %(i) pool.apply(func, (msg, )) print "Mark~ Mark~ Mark~~~~~~~~~~~~~~~~~~~~~~" pool.close() pool.join() # behind close() or terminate() print "Sub-process(es) done."
Running result
msg: hello 0endmsg: hello 1endmsg: hello 2endMark~ Mark~ Mark~~~~~~~~~~~~~~~~~~~~~~Sub-process(es) done.real 0m9.061suser 0m0.036ssys 0m0.019s
The main differences are apply_async and apply functions. The former is non-blocking and the latter is blocking. It can be seen that the multiple of the running time difference is the number of process pools.
3. Use multiprocessing. Pool and follow the results
import multiprocessingimport timedef func(msg): print "msg:", msg time.sleep(3) print "end" return "done" + msgif __name__ == "__main__": pool = multiprocessing.Pool(processes=4) result = [] for i in xrange(3): msg = "hello %d" %(i) result.append(pool.apply_async(func, (msg, ))) pool.close() pool.join() for res in result: print ":::", res.get() print "Sub-process(es) done."
Running result
msg: hello 0msg: hello 1msg: hello 2endendend::: donehello 0::: donehello 1::: donehello 2Sub-process(es) done.real 0m3.526suser 0m0.054ssys 0m0.024s
4. Use multiprocessing. Pool in the class
Errors may occur when you use the process pool in
PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed
This prompt is because the multiprocessing. Pool uses Queue communication. All data entering the Queue must be serializable (picklable), including custom class instances. As follows:
#!/bin/env pythonimport multiprocessingclass SomeClass(object): def __init__(self): pass def f(self, x): return x*x def go(self): pool = multiprocessing.Pool(processes=4) #result = pool.apply_async(self.f, [10]) #print result.get(timeout=1) print pool.map(self.f, range(10))SomeClass().go()
Run prompt
Traceback (most recent call last): File "4.py", line 18, in <module> SomeClass().go() File "4.py", line 16, in go print pool.map(self.f, range(10)) File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 251, in map return self.map_async(func, iterable, chunksize).get() File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 567, in get raise self._valuecPickle.PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed
Solution: (1)
#!/bin/env pythonimport multiprocessingdef func(x): return x*xclass SomeClass(object): def __init__(self,func): self.f = func def go(self): pool = multiprocessing.Pool(processes=4) #result = pool.apply_async(self.f, [10]) #print result.get(timeout=1) print pool.map(self.f, range(10))SomeClass(func).go()
Output result:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
(2) In general, if we write the processing logic in the class and want to minimize code changes, we can use the following method:
#!/bin/env pythonimport multiprocessingclass SomeClass(object): def __init__(self): pass def f(self, x): return x*x def go(self): result = list() pool = multiprocessing.Pool(processes=4) for i in range(10): result.append(pool.apply_async(func, [self, i])) pool.close() pool.join() for res in result: print res.get(timeout=1) def func(client, x): return client.f(x)SomeClass().go()
Output result:
0149162536496481
Note the following when using solution (2,If the SomeClass instance contains any nonserializable data, an error is reported. Generally, the res. get () error is returned. At this time, you need to check whether the Code has any unserializable variables. If yes, you can change it to a global variable.
3. Use the thread pool in multiple processes
In one scenario, multi-process and multi-thread needs to be used: in CPU-intensive scenarios, the ip address processing speed is around 0.04 seconds, and the running time of a single thread is about 3m32s, the CPU usage of a single process is 100%. The process pool (size = 10) takes about 6 m50s, of which only one process has 90% CPU usage, and the other is around 30%; the thread pool (size = 10) is about 4 m39s, with a single CPU usage of 100%
It can be seen that the use of multi-process is not dominant at this time, but is slower. Because switching between processes consumes most of the resources and time, it takes only 0.04 seconds for an ip address. Because the thread pool can only use single-core CPU, the speed of increasing the number of threads cannot be improved. Therefore, multi-process and multi-thread combination should be used at this time.
def run(self): self.getData() ipNums = len(self.ipInfo) step = ipNums / multiprocessing.cpu_count() ipList = list() i = 0 j = 1 processList = list() for ip in self.ipInfo: ipList.append(ip) i += 1 if i == step * j or i == ipNums: j += 1 def innerRun(): wm = Pool.ThreadPool(CONF.POOL_SIZE) for myIp in ipList: wm.addJob(self.handleOne, myIp) wm.waitForComplete() process = multiprocessing.Process(target=innerRun) process.start() processList.append(process) ipList = list() for process in processList: process.join()
If the machine has 8 CPUs, 8 processes are used to add a thread pool. The speed is increased to 35 s. The utilization of 8 CPUs is around 50%, and the average CPU usage of the machine is around 75%.
4. multi-process communication
Personal use of more is Manager, other especially distributed multi-process can learn Liao Xuefeng official website http://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000/001386832973658c780d8bfa4c6406f83b2b3097aed5df6000