< >python parallel Task tips
There are two package files that support map concurrency:
multiprocessing, there are less known but powerful sub-file multiprocessing.dummy.
Dummy is a full copy of a multi-process package. The only difference is that the multi-process package uses the process, and dummy uses the thread (naturally there are some limitations of Python itself). So there's another one there. It is very easy to switch between the two modes, and it is very helpful to determine whether the framework calls using IO or CPU mode.
Importing related Packages
1 from Import Pool
or 2fromimport Pool as ThreadPool
Initialization
1 pool = ThreadPool ()
1 ImportUrllib22 fromMultiprocessing.dummyImportPool as ThreadPool3 4URLs = [5 'http://www.python.org', 6 'http://www.python.org/about/',7 'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',8 'http://www.python.org/doc/',9 'http://www.python.org/download/',Ten 'http://www.python.org/getit/', One 'http://www.python.org/community/', A 'https://wiki.python.org/moin/', - 'http://planet.python.org/', - 'https://wiki.python.org/moin/LocalUserGroups', the 'http://www.python.org/psf/', - 'http://docs.python.org/devguide/', - 'http://www.python.org/community/awards/' - #etc.. + ] - + #Make the Pool of workers APool = ThreadPool (4) at #Open the URLs in their own threads - #and return the results -Results =Pool.map (Urllib2.urlopen, URLs) - #Close the pool and wait for the work to finish - pool.close () -Pool.join ()
The pool object requires some parameters . It can limit the number of worker in the thread pool. If not, it will use the system's number of cores as the initial value.
If you're doing a computationally intensive multi-process task, the more cores you have, the faster you'll be (of course, it's a prerequisite). But when it comes to network computing, the factors that affect it vary widely. So it's best to give the right number of thread pool sizes.
If you run a lot of threads, switching threads frequently can have a significant impact on productivity. So it's best to debug and find the time-balance point of task scheduling.
Map function simplifies python concurrency code