Use of the Process Pool class in the Python multi-process library multiprocessing,
Cause
Recently, a text is divided into several topics. Each topic designs a regressor. Each regressor is independent of each other. Finally, the regressor of all topics is summarized to get the expected result. That's right! Similar to bagging ensemble! I just didn't sample it. The text is not big. There are about 3000 lines, and the number of topics is 8. So I wrote a serial program, and one topic is counted as another topic. However, I used GridSearchCV for Parameter Adjustment in each topic, and also selected features and adjusted the regressor parameters. As a result, there were 1782 parameter combinations. I really underestimated the time for parameter adjustment. The program ran for one day and one night and finally failed to calculate the final prediction accuracy because I forgot to import a database. Later I thought, since every topic's prediction is independent, can it be parallel?
Multithreading and multi-process in Python
However, I have heard that the multi-thread of Python cannot actually use multiple cores. Therefore, if multithreading is used, concurrent processing is still performed on one core. However, if multiple processes are used, they can actually use multiple cores. Because each process is independent of each other and does not share resources, different processes can be executed on different cores, achieve the parallel effect. At the same time, in my case, topics are independent of each other and do not involve inter-process communication. We only need to summarize the results at the end. Therefore, using multiple processes is a good choice.
Multiprocessing
A sub-process
The multiprocessing module provides the process class to create a new process. The following code creates a child process.
From multiprocessing import Processdef f (name): print 'hello', nameif _ name _ = '_ main _': p = Process (target = f, args = ('bob',) # create a sub-process p, the target function is f, and The args is the parameter list of function f p. start () # start the execution process p. join () # Wait until the child process ends
In the above Code, p. join () indicates that subsequent operations are performed only after the sub-process ends. It is generally used for inter-process communication. For example, if there is a read process pw and a write process pr, you need to write pr. join () before calling pw, which indicates that the read process is started only after the write process is completed.
Multiple child Processes
If you want to create multiple sub-processes at the same time, you can use the multiprocessing. Pool class. This class can create a process pool and then execute these processes on multiple cores.
Import multiprocessingimport timedef func (msg): print multiprocessing. current_process (). name + '-' + msgif _ name _ = "_ main _": pool = multiprocessing. pool (processes = 4) # create four processes for I in xrange (10): msg = "hello % d" % (I) pool. apply_async (func, (msg,) pool. close () # close the process pool, indicating that the process pool cannot be added to the process pool. join () # Wait until all processes in the process pool are executed. print "Sub-process (es) done must be called after close."
The output result is as follows:
Sub-process(es) done.PoolWorker-34-hello 1PoolWorker-33-hello 0PoolWorker-35-hello 2PoolWorker-36-hello 3PoolWorker-34-hello 7PoolWorker-33-hello 4PoolWorker-35-hello 5PoolWorker-36-hello 6PoolWorker-33-hello 8PoolWorker-36-hello 9
Pool in the above Code. apply_async () is a variant of the apply () function, apply_async () is the parallel version of apply (), apply () is the blocking version of apply_async (), use apply () the main process will be blocked until the function execution ends, so the version is blocked. Apply () is both a Pool method and a built-in function in Python. The two are equivalent. We can see that the output results are not output in the order in the Code for loop.
Multiple Sub-processes and return values
Apply_async () itself can return the return value of the function called by the process. In the code of the previous sub-process, if a value is returned in the function func, the pool. the result of apply_async (func, (msg,) is the object that returns the value of all processes in the pool (note the object, not the value itself ).
Import multiprocessingimport timedef func (msg): return multiprocessing. current_process (). name + '-' + msgif _ name _ = "_ main _": pool = multiprocessing. pool (processes = 4) # create 4 processes results = [] for I in xrange (10): msg = "hello % d" % (I) results. append (pool. apply_async (func, (msg,) pool. close () # Shut down the process pool, indicating that no process can be added to the process pool. You need to call the pool before join. join () # Wait for all processes in the process pool to run print ("Sub-process (es) done. ") for res in results: print (res. get ())
The output result of the above Code is as follows:
Sub-process(es) done.PoolWorker-37-hello 0PoolWorker-38-hello 1PoolWorker-39-hello 2PoolWorker-40-hello 3PoolWorker-37-hello 4PoolWorker-38-hello 5PoolWorker-39-hello 6PoolWorker-37-hello 7PoolWorker-40-hello 8PoolWorker-38-hello 9
Unlike the previous output, this output is ordered.
If the computer is eight-core and eight processes are created, enter the top command under Ubuntu and press 1 on the big keyboard. The average CPU usage is displayed, for example:
You can also see the difference in the CPU usage curve before and after multi-process execution in system monitor.
The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.