Python thread pool/process pool for concurrent programming

Source: Internet
Author: User

Introduction

The Python standard library provides us with the threading and multiprocessing modules to write the corresponding multithreaded/multi-process code, but when the project reaches a certain scale, frequent creation/destruction of processes or threads is very resource intensive, and this time we will write our own thread pool/ Process pool, with space to change time. But starting with Python3.2, the standard library provides us with the concurrent.futures module, which provides Threadpoolexecutor and processpoolexecutor two classes, Further abstraction of threading and multiprocessing is achieved, providing direct support for writing thread pool/process pools.

Executor and future

The base of the Concurrent.futures module is exectuor, executor is an abstract class that cannot be used directly. However, the two subclass Threadpoolexecutor and processpoolexecutor that it provides are useful, as the name implies, to create the code for the thread pool and process pool, respectively. We can put the corresponding tasks directly into the thread pool/process pool, do not need to maintain queue to worry about deadlock problem, the thread pool/process pool will automatically help us dispatch.

Future This concept is believed to have the Java and Nodejs programming experience the friend certainly is not unfamiliar, you can interpret it as a work that completes in the next , this is the Asynchronous Programming Foundation, In the traditional programming mode, for example, when we operate the queue.get, there will be blocking before we wait for the results to be returned, and the CPU cannot let out other things, and the introduction of the future helps us to complete other operations during the waiting period. About asynchronous IO in Python you can refer to my Python concurrent programming after reading this article for the coprocessor/asynchronous IO.

P.S: If you are still sticking to python2.x, install the Futures module first.

PIP Install Futures

Using submit to manipulate thread pool/process Pool

Let's start with the following code to understand the thread pool concept

# example1.py
From concurrent.futures import Threadpoolexecutor
Import time
def return_future_result (message):
Time.sleep (2)
Return message
Pool = Threadpoolexecutor (max_workers=2) # Create a thread pool that can hold up to 2 tasks
Future1 = Pool.submit (Return_future_result, ("Hello") # Add a task to the thread pool
Future2 = Pool.submit (Return_future_result, ("World")) # Add a task to the thread pool
Print (Future1.done ()) # Determine if Task1 is over
Time.sleep (3)
Print (Future2.done ()) # Determine if Task2 is over
Print (Future1.result ()) # View results returned by Task1
Print (Future2.result ()) # View results returned by Task2
We analyze it according to the results of the operation. We use the Submit method to add a task,submit to the thread pool to return a future object, which can be simply understood as an operation to be done in the next. In the first print statement it was obvious that our future1 was not completed because of Time.sleep (2) because we paused the main thread with Time.sleep (3), so the task of our line constructor was all over when we went to the second print statement.

Ziwenxie:: ~»python example1.py
False
True
Hello
World
# during the execution of the above program, we can see three threads running in the background through the PS command.
Ziwenxie:: ~»ps-elf | grep python
Ziwenxie 8361 7557 8361 3 3 19:45 pts/0 00:00:00 python example1.py
Ziwenxie 8361 7557 8362 0 3 19:45 pts/0 00:00:00 python example1.py
Ziwenxie 8361 7557 8363 0 3 19:45 pts/0 00:00:00 python example1.py
The above code can also be rewritten as a process pool form, and the API and thread pool are the same, and I won't be wordy.

# example2.py
From concurrent.futures import Processpoolexecutor
Import time
def return_future_result (message):
Time.sleep (2)
Return message
Pool = Processpoolexecutor (max_workers=2)
Future1 = Pool.submit (Return_future_result, ("Hello"))
Future2 = Pool.submit (Return_future_result, ("World"))
Print (Future1.done ())
Time.sleep (3)
Print (Future2.done ())
Print (Future1.result ())
Print (Future2.result ())
Here is the result of the run

Ziwenxie:: ~»python example2.py
False
True
Hello
World
Ziwenxie:: ~»ps-elf | grep python
Ziwenxie 8560 7557 8560 3 3 19:53 pts/0 00:00:00 python example2.py
Ziwenxie 8560 7557 8563 0 3 19:53 pts/0 00:00:00 python example2.py
Ziwenxie 8560 7557 8564 0 3 19:53 pts/0 00:00:00 python example2.py
Ziwenxie 8561 8560 8561 0 1 19:53 pts/0 00:00:00 python example2.py
Ziwenxie 8562 8560 8562 0 1 19:53 pts/0 00:00:00 python example2.py

Using map/wait to manipulate thread pool/process Pool

In addition to Submit,exectuor provides us with a map method, similar to the built-in map usage, let's compare the differences between the two by two examples.

Review using the submit operation

# example3.py
Import Concurrent.futures
Import Urllib.request
URLS = [' http://httpbin.org ', ' http://example.com/', ' https://api.github.com/']
def load_url (URL, timeout):
With Urllib.request.urlopen (URL, timeout=timeout) as Conn:
Return Conn.read ()
# We can use a with statement to ensure threads is cleaned up promptly
With Concurrent.futures.ThreadPoolExecutor (max_workers=3) as executor:
# Start the load operations and mark each future with its URL
Future_to_url = {executor.submit (load_url, url): URL for URL in URLs}
For concurrent.futures.as_completed (Future_to_url):
url = future_to_url[future]
Try
data = Future.result ()
Except Exception as exc:
Print ('%r generated an exception:%s '% (URL, exc))
Else
Print ('%r page is%d bytes '% (URL, len (data))
As you can see from the running results,as_completed is not returned in the order of the URLs list elements

Ziwenxie:: ~»python example3.py
' http://example.com/' page is 1270 byte
' https://api.github.com/' page is 2039 bytes
' http://httpbin.org ' page is 12150 bytes

Using map

# example4.py
Import Concurrent.futures
Import Urllib.request
URLS = [' http://httpbin.org ', ' http://example.com/', ' https://api.github.com/']
def load_url (URL):
With Urllib.request.urlopen (URL, timeout=60) as Conn:
Return Conn.read ()
# We can use a with statement to ensure threads is cleaned up promptly
With Concurrent.futures.ThreadPoolExecutor (max_workers=3) as executor:
For URL, data in zip (URLs, Executor.map (Load_url, URLs)):
Print ('%r page is%d bytes '% (URL, len (data))
As you can see from the running results, themap is returned in the order of the URLs list elements , and the code that is written is more concise and intuitive, and we can choose one based on the specific requirements.

Ziwenxie:: ~»python example4.py
' http://httpbin.org ' page is 12150 bytes
' http://example.com/' page is 1270 bytes
' https://api.github.com/' page is 2039 bytes

The third option is wait

The wait method then returns a tuple (tuple) with two sets (set), one completed (completed) and the other uncompleted (unfinished). One advantage of using the wait method is that you get greater freedom, which receives three parameters first_completed, First_exception, and All_complete, which are set to all_completed by default.

Let's take a look at the difference between the three parameters in the following example

From concurrent.futures import threadpoolexecutor, wait, as_completed
From time import sleep
From random import Randint
def return_after_random_secs (num):
Sleep (Randint (1, 5))
Return "Return of {}". Format (num)
Pool = Threadpoolexecutor (5)
Futures = []
For x in range (5):
Futures.append (Pool.submit (return_after_random_secs, X))
Print (Wait (futures))
# Print (Wait (futures, Timeout=none, return_when= ' first_completed '))
If the default all_completed is used, the program blocks until all tasks in the thread pool are completed.

Ziwenxie:: ~»python example5.py
Doneandnotdonefutures (done={
<future at 0x7f0b06c9bc88 State=finished returned Str>
<future at 0x7f0b06cbaa90 State=finished returned Str>
<future at 0x7f0b06373898 State=finished returned Str>
<future at 0x7f0b06352ba8 State=finished returned Str>
<future at 0x7f0b06373b00 state=finished returned Str>}, Not_done=set ())
If the first_completed parameter is used, the program does not wait for all tasks in the thread pool to complete.

Ziwenxie:: ~»python example5.py
Doneandnotdonefutures (done={
<future at 0x7f84109edb00 State=finished returned Str>
<future at 0x7f840e2e9320 State=finished returned Str>
<future at 0X7F840F25CCC0 state=finished returned Str>},
Not_done={<future at 0x7f840e2e9ba8 State=running>
<future at 0x7f840e2e9940 State=running>})

Study Questions

Write a small program to compare Multiprocessing.pool (ThreadPool) and Processpollexecutor (Threadpoolexecutor) in the execution efficiency gap, Combining the above-mentioned future thinking is why this result.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.