Multiprocessing advanced applications in Python-process pools

Last Update:2015-08-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The following classes can create a process pool, so that various data processing tasks are submitted to the process pool. The capabilities provided by the process pool are somewhat similar to those provided by list parsing and functional programming operations such as the mapping-protocol.

Pool ([numprocess [, initializer [, Initargs]])
Create a worker process pool.
Numprocess is the number of processes to be created. If this argument is omitted, the value of Cpu_count () is used. "Here's a quick introduction:
From multiprocessing import Cpu_count
Print (Cpu_count ()) #获得电脑的CPU的个数
】。
Initializer is the callable object to execute when each worker process starts. Initargs is the parameter tuple to pass to the initializer. initializer defaults to none.
Instance P of the pool class supports the operation:

p.apply (func [, args[, Kwargs])
Executes the function (*args,**kwargs) in a pool worker process and returns the result. It is important to emphasize that this operation does not execute the Func function in parallel in all pool worker processes. If you want to execute the Func function concurrently with different parameters, you must call the P.apply () function from a different thread or use the P.apply_async () function.

P.apply_async (func [, args [, Kwargs [, Callback]])
Executes the function asynchronously in a pool worker process (*args,**kwargs) and returns the result. The result of this method is an instance of the AsyncResult class that can later be used to obtain the final result. Callback prohibits only being accustomed to any blocking operations, otherwise it will block the receipt of results from other asynchronous operations.

p.close ()
Close the process pool to prevent further action. If all operations persist, they are completed before the worker process terminates.

P.join ()
Wait for all worker processes to exit. This method can only be called after the close () or terminate () method.

P.imap (func, iterable [, chunksize])
One of the versions of the map () function that returns an iterator instead of a result list.

p.imap_unordered (func, iterable [, chunksize])
With the IMAP () function, but when the result is received from the worker process, the order of the results is returned arbitrarily.

P.map (func, iterable [, chunksize])
Applies the Callable object Func to all items in the iterable, and then returns the result as a list. This can be done in parallel by dividing the iterable into chunks and assigning work to the worker process. Chunksize set the number of items in each block.
If the amount of data is large, you can increase the value of chunksize to improve performance.

P.map_async (func, iterable [, Chunksize [, Callback]])
The same as the map () function, but the result is returned asynchronously. If the callable parameter is supplied, it is called with the result when the result becomes available.

p.terminate ()
Immediately terminates all worker processes without performing any cleanup or end of any pending work. If P is garbage collected, this function is called automatically.
The return value of the method Apply_async () and Map_async () is the asyncresult instance. The AsyncResult instance has the following methods.

a.get ([timeout])
Returns the result and waits for the result to arrive if necessary. Timeout is an optional time-out. If the result is not reached within the set time, multuprocessing will be triggered. Timeouterror exception. If an exception is thrown in a remote operation, it is raised again when this method is called.

A.ready ()
Returns True if the call is complete

a.sucessful ()
Returns True if the call is complete and no exception is thrown. If this method is called before the result is ready, a Assertionerror exception is thrown.

a.wait ([timeout])
Wait for the result to become available. Timeout is an optional time-out.

The following example shows how to use a process pool to build a dictionary that maps the file names of files in the entire directory to SHA512 digest values:

ImportMultiprocessingImportOsImportHashlib#Some parameters you can tweekBufsize=8192  #读取缓冲区大小Poolsize=4 def compute_digest(filename):    Try: F=open (filename,"RB")exceptIOError:return NoneDIGEST=HASHLIB.SHA512 () while True: Chunk=f.read (BUFSIZE)if  notChunk BreakDigest.update (Chunk) f.close ()returnFilename,digest.digest () def build_digest_map(topdir):Digest_pool=multiprocessing. Pool (4) allfiles= (Os.path.join (Path,name) forPath,dirs,filesinchOs.walk (Topdir) forNameinchfiles) digest_map=dict (digest_pool.imap_unordered (Compute_digest,allfiles, -)) Digest_pool.close ()returnDigest_mapif__name__=="__main__": Digest_map=build_digest_map ("F:\WaterFlow")PrintLen (Digest_map)

In this example, a generator expression is used to specify a sequence of path names for all files in a directory tree. The sequence is then split and passed to the process pool using the imap_unordered () function. Each pool worker process uses the Compute_digest () function to calculate SHA512 digest values for its files. The results are returned to the generator and then collected into the Python dictionary.
Keep in mind that it makes sense to use a process pool only if the pool worker process is fully leveraged to make the additional communication overhead worthwhile. In general, there is no point in using a process pool for simple calculations, such as the addition of two numbers.

Multiprocessing advanced applications in Python-process pools

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Multiprocessing advanced applications in Python-process pools

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support