Python multi-core parallel computing sample code and python sample code
In the past, I wrote a small program, but I didn't care about parallelism at all. I had no problem with single-core running, and my computer only had four dual-core hyper-threading threads (hereinafter referred to as the kernel ), it doesn't make sense to go about concurrency (unless I/O-intensive tasks are being performed ). Since the 32-core GB memory is used, we can see a pile of no-load cores in htop. Naturally, we will think that this parallel operation must be done. Later I found that Python's parallel processing is actually very simple.
Multiprocessing vs threading
Python comes with a full and easy-to-use library, which is one of the reasons why I especially like Python. Python contains the multiprocessing and threading libraries for implementing parallel processing. It should be a natural idea to use threads. After all (intuitively) there are low overhead and shared memory benefits, and thread usage in other languages is indeed very frequent. However, I can say responsibly that if you are using CPython implementation, then using threading is equivalent to saying goodbye to parallel computing (in fact, even slower than a single thread), unless it is an IO-intensive task.
GIL
CPython refers to the Python implementation provided by python.org. Yes, Python is a language with different implementations, such as PyPy, Jython, IronPython, etc ...... CPython is the most used. It almost equals Python.
CPython uses GIL, the global lock, to simplify the implementation of the interpreter, so that the interpreter only executes the bytecode in one thread at a time. That is to say, unless I/O operations are waiting, The multithreading of CPython is a complete lie!
The following two documents about GIL are well written:
- Http://cenalulu.github.io/python/gil-in-python/
- Http://www.dabeaz.com/python/UnderstandingGIL.pdf
Multiprocessing. Pool
Threading cannot be used because of GIL, so we should study multiprocessing well. (Of course, if you say you don't need CPython and you don't have a GIL problem, it's also great .)
First, we will introduce a simple, crude, and very practical tool, namely multiprocessing. Pool. If your task can be solved using ys = map (f, xs), we may all know that such a form is inherently the easiest to be parallel, in Python, parallel computing is really easy. For example, set each number to square:
import multiprocessingdef f(x): return x * xcores = multiprocessing.cpu_count()pool = multiprocessing.Pool(processes=cores)xs = range(5)# method 1: mapprint pool.map(f, xs) # prints [0, 1, 4, 9, 16]# method 2: imapfor y in pool.imap(f, xs): print y # 0, 1, 4, 9, 16, respectively# method 3: imap_unorderedfor y in pool.imap_unordered(f, xs): print(y) # may be in any order
Map directly returns the list, while the two functions starting with I return the iterator; The imap_unordered returns unordered.
When the computing time is relatively long, we may want to add a progress bar, which will reflect the benefits of series I. In addition, there is a trick: Output \ r can make the cursor return to the beginning of the line without line breaks, so that you can make a simple progress bar.
cnt = 0for _ in pool.imap_unordered(f, xs): sys.stdout.write('done %d/%d\r' % (cnt, len(xs))) cnt += 1
More complex operations
To perform more complex operations, you can directly use the multiprocessing. Process object. To achieve inter-process communication, you can use:
- Multiprocessing. Pipe
- Multiprocessing. Queue
- Synchronization primitive
- Shared variable
Among them, I strongly recommend Queue, because in many scenarios, the producer and consumer model is used, and Queue is used to solve the problem. The method used is also very simple. Now the parent Process creates a Queue and then transmits it as an args or kwargs to the Process.
Precautions for using tools such as Theano or Tensorflow
Note that some side effects may occur when the Cuda tool is called, such as import theano or import tensorflow. these side effects will be copied to the sub-process as they are, and then an error occurs, for example:
Cocould not retrieve CUDA device count: CUDA_ERROR_NOT_INITIALIZED
The solution is to ensure that the parent process does not introduce these tools. Instead, after the child process is created, the child process is introduced separately.
If Process is used, import in the target function. For example:
import multiprocessingdef hello(taskq, resultq): import tensorflow as tf config = tf.ConfigProto() config.gpu_options.allow_growth=True sess = tf.Session(config=config) while True: name = taskq.get() res = sess.run(tf.constant('hello ' + name)) resultq.put(res)if __name__ == '__main__': taskq = multiprocessing.Queue() resultq = multiprocessing.Queue() p = multiprocessing.Process(target=hello, args=(taskq, resultq)) p.start() taskq.put('world') taskq.put('abcdabcd987') taskq.close() print(resultq.get()) print(resultq.get()) p.terminate() p.join()
If a Pool is used, you can compile a function, import it in the function, and pass this function into the Pool constructor as an initializer. For example:
import multiprocessingdef init(): global tf global sess import tensorflow as tf config = tf.ConfigProto() config.gpu_options.allow_growth=True sess = tf.Session(config=config)def hello(name): return sess.run(tf.constant('hello ' + name))if __name__ == '__main__': pool = multiprocessing.Pool(processes=2, initializer=init) xs = ['world', 'abcdabcd987', 'Lequn Chen'] print pool.map(hello, xs)
The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.