Python multi-process usage summary, python Process summary

Source: Internet
Author: User

Python multi-process usage summary, python Process summary

Multi-Process in python is mainly usedMultiprocessingThis library. This library may cause problems when using multiprocessing. Manager (). Queue. We recommend that you upgrade python to a later version, for example, 2.7.11. For details, refer to "python version upgrade".

For details about how to use a thread pool in python, refer to python thread pool implementation.

I. multi-process usage

1. The fork function can be used in linux.

#!/bin/env pythonimport osprint 'Process (%s) start...' % os.getpid()pid = os.fork()if pid==0:    print 'I am child process (%s) and my parent is %s.' % (os.getpid(), os.getppid())    os._exit(1)else:    print 'I (%s) just created a child process (%s).' % (os.getpid(), pid)

Output

Process (22246) start...I (22246) just created a child process (22247).I am child process (22247) and my parent is 22246.

2. Use multiprocessing

#!/bin/env pythonfrom multiprocessing import Processimport osimport timedef run_proc(name):    time.sleep(3)    print 'Run child process %s (%s)...' % (name, os.getpid())if __name__=='__main__':    print 'Parent process %s.' % os.getpid()    processes = list()    for i in range(5):        p = Process(target=run_proc, args=('test',))        print 'Process will start.'        p.start()        processes.append(p)        for p in processes:        p.join()    print 'Process end.'

Output

Parent process 38140.Process will start.Process will start.Process will start.Process will start.Process will start.Run child process test (38141)...Run child process test (38142)...Run child process test (38143)...Run child process test (38145)...Run child process test (38144)...Process end.real    0m3.028suser    0m0.021ssys     0m0.004s

 

2. Process pool

1. Use multiprocessing. Pool for non-blocking

#!/bin/env pythonimport multiprocessingimport timedef func(msg):    print "msg:", msg    time.sleep(3)    print "end"if __name__ == "__main__":    pool = multiprocessing.Pool(processes = 3)    for i in xrange(3):        msg = "hello %d" %(i)        pool.apply_async(func, (msg, ))    print "Mark~ Mark~ Mark~~~~~~~~~~~~~~~~~~~~~~"    pool.close()    pool.join()    # behind close() or terminate()    print "Sub-process(es) done."

Running result

Mark~ Mark~ Mark~~~~~~~~~~~~~~~~~~~~~~msg: hello 0msg: hello 1msg: hello 2endendendSub-process(es) done.real    0m3.493suser    0m0.056ssys     0m0.022s

2. Use multiprocessing. Pool to block the version

#!/bin/env pythonimport multiprocessingimport timedef func(msg):    print "msg:", msg    time.sleep(3)    print "end"if __name__ == "__main__":    pool = multiprocessing.Pool(processes = 3)    for i in xrange(3):        msg = "hello %d" %(i)        pool.apply(func, (msg, ))          print "Mark~ Mark~ Mark~~~~~~~~~~~~~~~~~~~~~~"    pool.close()    pool.join()    # behind close() or terminate()    print "Sub-process(es) done."

Running result

msg: hello 0endmsg: hello 1endmsg: hello 2endMark~ Mark~ Mark~~~~~~~~~~~~~~~~~~~~~~Sub-process(es) done.real    0m9.061suser    0m0.036ssys     0m0.019s

The main differences are apply_async and apply functions. The former is non-blocking and the latter is blocking. It can be seen that the multiple of the running time difference is the number of process pools.

3. Use multiprocessing. Pool and follow the results

import multiprocessingimport timedef func(msg):    print "msg:", msg    time.sleep(3)    print "end"    return "done" + msgif __name__ == "__main__":    pool = multiprocessing.Pool(processes=4)    result = []    for i in xrange(3):        msg = "hello %d" %(i)        result.append(pool.apply_async(func, (msg, )))    pool.close()    pool.join()    for res in result:        print ":::", res.get()    print "Sub-process(es) done."

Running result

msg: hello 0msg: hello 1msg: hello 2endendend::: donehello 0::: donehello 1::: donehello 2Sub-process(es) done.real    0m3.526suser    0m0.054ssys     0m0.024s

4. Use multiprocessing. Pool in the class

Errors may occur when you use the process pool in

PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed

This prompt is because the multiprocessing. Pool uses Queue communication. All data entering the Queue must be serializable (picklable), including custom class instances. As follows:

#!/bin/env pythonimport multiprocessingclass SomeClass(object):    def __init__(self):        pass    def f(self, x):        return x*x    def go(self):        pool = multiprocessing.Pool(processes=4)        #result = pool.apply_async(self.f, [10])             #print result.get(timeout=1)                   print pool.map(self.f, range(10))SomeClass().go()

Run prompt

Traceback (most recent call last):  File "4.py", line 18, in <module>    SomeClass().go()  File "4.py", line 16, in go    print pool.map(self.f, range(10))  File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 251, in map    return self.map_async(func, iterable, chunksize).get()  File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 567, in get    raise self._valuecPickle.PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed

Solution: (1)

#!/bin/env pythonimport multiprocessingdef func(x):    return x*xclass SomeClass(object):    def __init__(self,func):        self.f = func    def go(self):        pool = multiprocessing.Pool(processes=4)        #result = pool.apply_async(self.f, [10])        #print result.get(timeout=1)        print pool.map(self.f, range(10))SomeClass(func).go()

Output result:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

(2) In general, if we write the processing logic in the class and want to minimize code changes, we can use the following method:

#!/bin/env pythonimport multiprocessingclass SomeClass(object):    def __init__(self):        pass    def f(self, x):        return x*x    def go(self):        result = list()        pool = multiprocessing.Pool(processes=4)        for i in range(10):            result.append(pool.apply_async(func, [self, i]))        pool.close()        pool.join()        for res in result:            print res.get(timeout=1)   def func(client, x):    return client.f(x)SomeClass().go()

Output result:

0149162536496481

Note the following when using solution (2,If the SomeClass instance contains any nonserializable data, an error is reported. Generally, the res. get () error is returned. At this time, you need to check whether the Code has any unserializable variables. If yes, you can change it to a global variable.

 

3. Use the thread pool in multiple processes

In one scenario, multi-process and multi-thread needs to be used: in CPU-intensive scenarios, the ip address processing speed is around 0.04 seconds, and the running time of a single thread is about 3m32s, the CPU usage of a single process is 100%. The process pool (size = 10) takes about 6 m50s, of which only one process has 90% CPU usage, and the other is around 30%; the thread pool (size = 10) is about 4 m39s, with a single CPU usage of 100%

It can be seen that the use of multi-process is not dominant at this time, but is slower. Because switching between processes consumes most of the resources and time, it takes only 0.04 seconds for an ip address. Because the thread pool can only use single-core CPU, the speed of increasing the number of threads cannot be improved. Therefore, multi-process and multi-thread combination should be used at this time.

def run(self):    self.getData()    ipNums = len(self.ipInfo)    step = ipNums / multiprocessing.cpu_count()    ipList = list()    i = 0    j = 1    processList = list()    for ip in self.ipInfo:        ipList.append(ip)        i += 1        if i == step * j or i == ipNums:            j += 1            def innerRun():                wm = Pool.ThreadPool(CONF.POOL_SIZE)                for myIp in ipList:                    wm.addJob(self.handleOne, myIp)                wm.waitForComplete()            process = multiprocessing.Process(target=innerRun)            process.start()            processList.append(process)            ipList = list()    for process in processList:        process.join()

If the machine has 8 CPUs, 8 processes are used to add a thread pool. The speed is increased to 35 s. The utilization of 8 CPUs is around 50%, and the average CPU usage of the machine is around 75%.

 

4. multi-process communication

Personal use of more is Manager, other especially distributed multi-process can learn Liao Xuefeng official website http://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000/001386832973658c780d8bfa4c6406f83b2b3097aed5df6000

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.