The use of process pooling pool class in Python multi-process library multiprocessing

Source: Internet
Author: User

Cause of the problem

Recently to divide a text into several topic, each topic design a regressor, each regressor is independent of each other, and finally summarizes all topic's regressor to get the predicted results. That's right! Similar to bagging ensemble! I just don't have a sample. Text is not small, about 3000 lines, topic number is 8, so I wrote a serial program, a topic after the calculation of another topic. But I used in each topic to adjust the parameters, but GridSearchCV also to select the characteristics and adjust the regressor parameter, resulting in a total of 1782 parameter combinations. I really underestimated the timing of the tuning, and the program ran all day and night. Finally, because forgetting to import a library results in the final prediction precision being calculated. Later thought, since each topic prediction is independent, that can be parallel?

Multi-threading and multi-process in Python

But the fact that Python's multithreading does not really take advantage of multi-core is true, so if multithreading is actually done concurrently with a single core. However, multi-processes can be used to make real use of multicore, because the processes are independent of each other and do not share resources, and can perform different processes on different cores to achieve parallel results. At the same time in my question, each topic is independent of each other, does not involve inter-process communication, only the final summary results, so the use of multi-process is a good choice.

Multiprocessing a child process

The multiprocessing module provides a process class to implement the new procedure. The following code is a new child process.

1  fromMultiprocessingImportProcess2 3 deff (name):4     Print 'Hello', name5 6 if __name__=='__main__':7p = Process (Target=f, args= ('Bob',))#Create a new child process p, the target function is F,args is the argument list of function f8P.start ()#start the execution process9P.join ()#wait for the child process to end

In the preceding code p.join() , it is meant to wait for the child process to finish before performing subsequent operations, typically for interprocess communication. For example, there is a read process pw and a write-process PR, which needs to be written before invoking PW pr.join() , indicating that the read process is not started until the write process is finished.

Multiple child processes

You can use classes if you want to create multiple child processes at the same time multiprocessing.Pool . The class can create a process pool and then execute those processes on multiple cores.

ImportMultiprocessingImport Timedeffunc (msg):PrintMultiprocessing.current_process (). Name +'-'+msgif __name__=="__main__": Pool= multiprocessing. Pool (processes=4)#Create 4 Processes     forIinchXrange (10): Msg="Hello%d"%(i) Pool.apply_async (func, (msg,)) Pool.close ()#Close the process pool, indicating that the process cannot be added to the process poolPool.join ()#waits for all processes in the process pool to finish executing and must be called after close ()    Print "sub-process (es) done."

The output results are as follows:

1 sub-process (es) done. 2 Poolworker-34-hello 1 3 poolworker-33-Hello 0  4 Poolworker-35-hello 2 5 P Oolworker-36-hello 3 6 Poolworker-34-hello 7 7 Poolworker-33-hello 4 8 Poolworker-35-hello 5< c10> 9 Poolworker-36-hello 6 Poolworker-33-hello 8 Poolworker-36-hello 9

The above code is a variant of the pool.apply_async() apply() function, apply_async() is apply() the parallel version, apply() is apply_async() the blocking version, the use of the apply() main process will be blocked until the function execution ends, so it is blocked version. apply()both Pool the method and the python built-in function are equivalent. You can see that the output is not output in the order of the Code for loop.

Multiple child processes and return values

apply_async()itself can return the return value of the function called by the process. In the previous code that created multiple child processes, if a value is returned in the function func , pool.apply_async(func, (msg, )) The result is an object that returns the value of all the processes in the pool (note that the object, not the value itself).

1 ImportMultiprocessing2 Import Time3 4 deffunc (msg):5     returnMultiprocessing.current_process (). Name +'-'+msg6 7 if __name__=="__main__":8Pool = multiprocessing. Pool (processes=4)#Create 4 Processes9Results = []Ten      forIinchXrange (10): Onemsg ="Hello%d"%(i) A Results.append (Pool.apply_async (func, (msg,))) -Pool.close ()#Close the process pool, which means that you can no longer add processes to the process pool and need to call before join -Pool.join ()#wait for all processes in the process pool to finish executing the     Print("sub-process (es) done.") -  -      forResinchResults: -         Print(Res.get ())

The above code output results are as follows:

1 sub-process (es) done. 2 poolworker-37-Hello 0  3 Poolworker-38-hello 1 4 Poolworker-39-hello 2 5 P Oolworker-40-hello 3 6 Poolworker-37-hello 4 7 Poolworker-38-hello 5 8 Poolworker-39-hello 6< c10> 9 Poolworker-37-hello 7 Poolworker-40-hello 8 Poolworker-38-hello 9

Unlike the previous output, this time the output is ordered.

If the computer is eight cores, set up 8 processes, enter the top command under Ubuntu and press the 1 on the large keyboard, you can see that each CPU usage is relatively average, such as:

It is also clear in System Monitor that the difference in CPU usage curves before and after a multi-process is performed.

The use of process pools pool classes in Python multi-process library multiprocessing

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.