Python's concurrency processing power is notorious. Aside from threading and Gil issues, I don't think the root cause of multithreading is technical but conceptual. Most of the information about Pyhon threads and multi-processes is pretty good, but it's too detailed. This information is all about the end of the story, to the actual use of the part is hastily ended.
Traditional examples
In DDG https://duckduckgo.com/search for the "Python threading Tutorial" keyword, the result is basically the same class + queue example.
Standard threading Multi-process, producer/Consumer example:
Here is the code, it would be pretty ugly if you put a large piece of code in another pattern. Text mode point here
Mmm.. It feels like Java code.
I do not want to confirm that using the producer/consumer model to handle threading/multi-process is wrong--yes, it's fine. In fact, this is the best choice to solve many problems. However, I do not think this is the usual way of daily work.
Problem lies
In the beginning, you need a matting class that performs the following operations. Next, you need to create a queue that passes the object and listen to the queue on both sides in real time to complete the task. (It is likely that two queues will be required to communicate with each other or to store data)
The more workers, the bigger the problem.
Next, you might consider putting these workers in a thread pool to improve the processing speed of Python. Below is
Example code on IBM tutorial on thread-good. This is a common scenario where you use multithreading for Web pages
Seriously, Medium. Fix your code support. Code is here.
Feel the effect should be good, but look at the code! Initialize method, thread trace, and worst of all, if you are a person who is prone to a deadlock problem like me, the join statement is going to go wrong. It's starting to get more complicated!
What have you done so far? Basically nothing. The code above is basic and error prone. (Gosh, I forgot to write on the queue object called Task_done () method (I'm too lazy to fix this problem again)), which is really low price. Fortunately, we have a better way.
Introduction: Map
Map is a cool little feature and the key to simplifying Python concurrency code. For those unfamiliar with the map, it is a bit like lisp. It is the function mapping function of serialization. e.g.
urls = [‘, ‘]results = map(urllib2.urlopen, urls)
This invokes the Urlopen method and returns all previous invocation results to a collection in order. It's kind of similar.
results = []for url in urls: results.append(urllib2.urlopen(url))
Map is able to process the collection in sequential traversal, ultimately saving the result of the call in a simple collection.
Why did you mention it? Since the introduction of the required package files, map can greatly simplify concurrency complexity!
There are two package files that support map concurrency:
Multiprocessing, there are less known but powerful sub-file Multiprocessing.dummy. .
Digression, what the hell is this? Never heard of a thread referencing a multi-process package file called dummy. And I didn't know until recently. It is only mentioned in the multi-process documentation. Its effect is only to let people until there is such a thing. This is really a marketing mistake!
Dummy is a full copy of a multi-process package. The only difference is that the multi-process package uses the process, and dummy uses the thread (naturally there are some limitations of Python itself). So there's another one there. It is very easy to switch between the two modes, and it is very helpful to determine whether the framework calls using IO or CPU mode.
Ready to start
Prepare to import the relevant package file using the map feature with concurrency first:
from multiprocessing import Poolfrom multiprocessing.dummy import Pool as ThreadPool
Then initialize:
pool = ThreadPool()
The simple sentence solves the function of Build_worker_pool in example2.py. Specifically, it first creates some valid worker to start it and save it in some variables for easy access.
The pool object requires some parameters, but now the most urgent thing is: the process. It can limit the number of worker in the thread pool. If not, it will use the system's number of cores as the initial value.
In general, if you are doing a computationally intensive multi-process task, the more cores you have, the faster (and of course, the premise). But when it comes to network computing, the factors that affect it vary widely. So it's best to give the appropriate number of thread pool sizes.
pool = ThreadPool(4) # Sets the pool size to 4
If you run a lot of threads, switching threads frequently can have a significant impact on productivity. Therefore, it is best to find out the time-balance point of task scheduling through debugging.
OK, now that the thread pool object has been built, there are simple concurrency content. Let's rewrite some of the URLs in example2.py opener!
Look at it! Just use 4 lines of code to get it done! Three of these lines are still a fixed notation. Using the Map method is a simple thing to do before you need 40 lines of code! In order to increase the interest, I separately counted the different thread pool size the running time.
Results:
Amazing results! It seems that debugging is really useful. When the thread pool size exceeds 9, the effect on my local computer is similar.
Example 2:
Generate thumbnails of thousands of images:
Now let's look at a one-year computing-intensive task! One of the problems I've encountered most often is the processing of a large number of image folders.
One of the tasks is to create a thumbnail image. This is one of the more mature features in concurrency.
Basic single thread creation process
As an example, it's a little bit complicated. But in fact, it is to pass a folder directory in, get all the pictures inside, create thumbnails and then save to their respective directories.
It takes about 27.9 seconds to process about 6000 pictures on my computer.
If you use concurrent map processing instead of the FOR loop:
It only took 5.6 seconds!
Just a few lines of code speed can get such a huge boost. The final version is faster to process. Because we assign compute-intensive and IO-intensive tasks to separate threads and processes, this may easily lead to deadlocks, but with the powerful features of the map, we can always design beautiful, high-reliability programs with simple debugging. For now, there is no other way.
All right. Let's take a look at the concurrent program of a line of code.
(1) English original: https://medium.com/p/40e9b2b36148
(2) Original code: https://github.com/chriskiehl/Blog/tree/master/40e9b2b36148
(3) A few additions to Python's parallel task skills http://liming.me/2014/01/12/python-multitask-fixed/
(4) in the single core CPU, Python GIL limit, multi-threaded need to lock it?
https://github.com/onlytiancai/codesnip/blob/master/python/sprace.py
(5) Gevent Programmer's Guide http://xlambda.com/gevent-tutorial/#_8
(6) Understanding of processes, threads, and co-routines
Http://blog.leiqin.name/2012/12/02/%E8%BF%9B%E7%A8%8B%E3%80%81%E7%BA%BF%E7%A8%8B%E5%92%8C%E5%8D%8F%E7%A8%8B%E7 %9a%84%e7%90%86%e8%a7%a3.html
(7) Python multi-process: from Multiprocessing.pool import ThreadPool
http://hi.baidu.com/0xcea4/item/ddd133c187a6277089ad9e4b
Http://outofmemory.cn/code-snippet/6723/Python-many-process-bingfa-multiprocessing
(8) Threading and multiprocessing module of Python
http://blog.csdn.net/zhaozhi406/article/details/8137670
(9) Concurrent programming with Python
http://python.jobbole.com/81255/
(go) Python's parallel task skills