(go) Python's parallel task skills

Source: Internet
Author: User

Python's concurrency processing power is notorious. Aside from threading and Gil issues, I don't think the root cause of multithreading is technical but conceptual. Most of the information about Pyhon threads and multi-processes is pretty good, but it's too detailed. This information is all about the end of the story, to the actual use of the part is hastily ended.

Traditional examples

In DDG https://duckduckgo.com/search for the "Python threading Tutorial" keyword, the result is basically the same class + queue example.
Standard threading Multi-process, producer/Consumer example:

Here is the code, it would be pretty ugly if you put a large piece of code in another pattern. Text mode point here
Mmm.. It feels like Java code.
I do not want to confirm that using the producer/consumer model to handle threading/multi-process is wrong--yes, it's fine. In fact, this is the best choice to solve many problems. However, I do not think this is the usual way of daily work.

Problem lies

In the beginning, you need a matting class that performs the following operations. Next, you need to create a queue that passes the object and listen to the queue on both sides in real time to complete the task. (It is likely that two queues will be required to communicate with each other or to store data)
The more workers, the bigger the problem.
Next, you might consider putting these workers in a thread pool to improve the processing speed of Python. Below is
Example code on IBM tutorial on thread-good. This is a common scenario where you use multithreading for Web pages

Seriously, Medium. Fix your code support. Code is here.

Feel the effect should be good, but look at the code! Initialize method, thread trace, and worst of all, if you are a person who is prone to a deadlock problem like me, the join statement is going to go wrong. It's starting to get more complicated!
What have you done so far? Basically nothing. The code above is basic and error prone. (Gosh, I forgot to write on the queue object called Task_done () method (I'm too lazy to fix this problem again)), which is really low price. Fortunately, we have a better way.

Introduction: Map

Map is a cool little feature and the key to simplifying Python concurrency code. For those unfamiliar with the map, it is a bit like lisp. It is the function mapping function of serialization. e.g.

urls = [‘, ‘]results = map(urllib2.urlopen, urls)

This invokes the Urlopen method and returns all previous invocation results to a collection in order. It's kind of similar.

results = []for url in urls:     results.append(urllib2.urlopen(url))

Map is able to process the collection in sequential traversal, ultimately saving the result of the call in a simple collection.
Why did you mention it? Since the introduction of the required package files, map can greatly simplify concurrency complexity!

There are two package files that support map concurrency:
Multiprocessing, there are less known but powerful sub-file Multiprocessing.dummy. .

Digression, what the hell is this? Never heard of a thread referencing a multi-process package file called dummy. And I didn't know until recently. It is only mentioned in the multi-process documentation. Its effect is only to let people until there is such a thing. This is really a marketing mistake!

Dummy is a full copy of a multi-process package. The only difference is that the multi-process package uses the process, and dummy uses the thread (naturally there are some limitations of Python itself). So there's another one there. It is very easy to switch between the two modes, and it is very helpful to determine whether the framework calls using IO or CPU mode.

Ready to start

Prepare to import the relevant package file using the map feature with concurrency first:

from multiprocessing import Poolfrom multiprocessing.dummy import Pool as ThreadPool

Then initialize:

pool = ThreadPool()

The simple sentence solves the function of Build_worker_pool in example2.py. Specifically, it first creates some valid worker to start it and save it in some variables for easy access.
The pool object requires some parameters, but now the most urgent thing is: the process. It can limit the number of worker in the thread pool. If not, it will use the system's number of cores as the initial value.

In general, if you are doing a computationally intensive multi-process task, the more cores you have, the faster (and of course, the premise). But when it comes to network computing, the factors that affect it vary widely. So it's best to give the appropriate number of thread pool sizes.

pool = ThreadPool(4) # Sets the pool size to 4

If you run a lot of threads, switching threads frequently can have a significant impact on productivity. Therefore, it is best to find out the time-balance point of task scheduling through debugging.
OK, now that the thread pool object has been built, there are simple concurrency content. Let's rewrite some of the URLs in example2.py opener!

Look at it! Just use 4 lines of code to get it done! Three of these lines are still a fixed notation. Using the Map method is a simple thing to do before you need 40 lines of code! In order to increase the interest, I separately counted the different thread pool size the running time.

Results:

Amazing results! It seems that debugging is really useful. When the thread pool size exceeds 9, the effect on my local computer is similar.

Example 2:

Generate thumbnails of thousands of images:
Now let's look at a one-year computing-intensive task! One of the problems I've encountered most often is the processing of a large number of image folders.
One of the tasks is to create a thumbnail image. This is one of the more mature features in concurrency.
Basic single thread creation process

As an example, it's a little bit complicated. But in fact, it is to pass a folder directory in, get all the pictures inside, create thumbnails and then save to their respective directories.
It takes about 27.9 seconds to process about 6000 pictures on my computer.
If you use concurrent map processing instead of the FOR loop:

It only took 5.6 seconds!

Just a few lines of code speed can get such a huge boost. The final version is faster to process. Because we assign compute-intensive and IO-intensive tasks to separate threads and processes, this may easily lead to deadlocks, but with the powerful features of the map, we can always design beautiful, high-reliability programs with simple debugging. For now, there is no other way.
All right. Let's take a look at the concurrent program of a line of code.

(1) English original: https://medium.com/p/40e9b2b36148

(2) Original code: https://github.com/chriskiehl/Blog/tree/master/40e9b2b36148

(3) A few additions to Python's parallel task skills http://liming.me/2014/01/12/python-multitask-fixed/

(4) in the single core CPU, Python GIL limit, multi-threaded need to lock it?

https://github.com/onlytiancai/codesnip/blob/master/python/sprace.py

(5) Gevent Programmer's Guide http://xlambda.com/gevent-tutorial/#_8

(6) Understanding of processes, threads, and co-routines

Http://blog.leiqin.name/2012/12/02/%E8%BF%9B%E7%A8%8B%E3%80%81%E7%BA%BF%E7%A8%8B%E5%92%8C%E5%8D%8F%E7%A8%8B%E7 %9a%84%e7%90%86%e8%a7%a3.html

(7) Python multi-process: from Multiprocessing.pool import ThreadPool
http://hi.baidu.com/0xcea4/item/ddd133c187a6277089ad9e4b

Http://outofmemory.cn/code-snippet/6723/Python-many-process-bingfa-multiprocessing

(8) Threading and multiprocessing module of Python

http://blog.csdn.net/zhaozhi406/article/details/8137670

(9) Concurrent programming with Python

http://python.jobbole.com/81255/

(go) Python's parallel task skills

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.