Previously reproduced an analysis process pool source of the blog post, is an analysis of the process pool is a very comprehensive article, click here to read. In Python there is also a thread pool concept, it also has concurrent processing capacity, to a certain extent, can improve the efficiency of the system, not the right place to welcome criticism.
The life cycle of a thread can be divided into 5 states: Create, ready, run, block, and terminate. From thread creation to termination, the threads continue to run, create, and destroy these 3 states. The run time of a thread can be divided into 3 parts: The start time of the thread, the run time of the thread body, and the time the thread was destroyed. In a multithreaded scenario, if a thread cannot be reused, it means that each creation requires 3 processes to be started, destroyed, and run. This inevitably increases the system's corresponding time and reduces the efficiency. Take a look at the previous example of thread posting (click here to read), how many tasks there are, how many threads are created, but because of Python's unique Gil limitations, it is not really multi-threading, but it degrades performance because of the overhead of frequent switching tasks ( Click here to learn about the Gil of Python). In this case, the thread pool can be used to improve operational efficiency.
The basic principle of the thread pool is that it is usually arranged in a queue task by putting in a pool A number of threads that will be able to perform tasks beforehand. In general, you need to handle more tasks than the number of threads, and when the thread finishes executing the current task, it takes the next task from the queue and knows that all the tasks are complete.
650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M00/8B/0A/wKioL1hCrKWhht4HAAFhIvzPzXo843.png-wh_500x0-wm_3 -wmp_4-s_3770780620.png "title=" thread pool schematic "alt=" wkiol1hcrkwhht4haafhivzpzxo843.png-wh_50 "/>
Because threads are pre-created and placed in the thread pool, and are not destroyed but are scheduled to process the next task after the current task has been processed, it is possible to avoid creating threads multiple times, thus saving the overhead of thread creation and destruction, resulting in better performance and system stability. So, to be blunt, Python's thread pool does not take advantage of multicore or multi-CPU, but it does not have to create threads many times, saving thread creation and destruction time, which improves performance, compared to normal multithreading.
The Python thread pool technology is suitable for processing bursts of large requests or a large number of threads to complete the task, but the actual processing time of each task is short, it can effectively avoid due to excessive system creation threads resulting in too much performance load, slow response and so on. Here are a few ways to leverage the thread pool.
(i) custom thread pool mode
We can use the queue module and the threading module to implement the thread pool. Queue is used to create a task queue, and threading is used to create a thread pool.
Look at the following example
import Queue, threading
class Worker (threading.Thread):
"" "
Define a thread class that can handle tasks, which belongs to the custom thread class, and the custom thread class needs to define the run () function
"" "
def __init __ (self, workqueue, resultqueue, ** kwargs):
threading.Thread .__ init __ (self, ** kwargs)
self.workqueue = workqueue # The queue to store tasks, tasks are generally functions
self.resultqueue = resultqueue # queue of results
def run (self):
while True:
try:
#Remove a task from the task queue, block is set to False means that if the queue is empty, an exception will be thrown
callable, args, kwargs = self.workqueue.get (block = False)
res = callable (* args, ** kwargs)
self.resultqueue.put (res) # put the result of the task in the result queue
except Queue.Empty: #Throw an empty queue exception
break
class WorkerManger (object):
"" "
Define a thread pool class
"" "
def __init __ (self, num = 10): #By default there are 10 threads in this pool
self.workqueue = Queue.Queue () # task queue,
self.resultqueue = Queue.Queue () # Queue for storing task results
self.workers = [] #All threads are stored in this list
self._recruitthreads (num) #A function that creates a series of threads
def _recruitthreads (self, num):
"" "
Create a thread
"" "
for i in xrange (num):
worker = Worker (self.workqueue, self.resultqueue)
self.workers.append (worker)
def start (self):
"" "
Start each thread in the thread pool
"" "
for work in self.workers:
work.start ()
def wait_for_complete (self):
"" "
Wait until all tasks in the task queue are completed
"" "
while len (self.workers):
worker = self.workers.pop ()
worker.join ()
if worker.isAlive () and not self.workqueue.empty ():
self.workers.append (worker)
def add_job (self, callable, * args, ** kwargs):
"" "
Add tasks to the task queue
"" "
self.workqueue.put ((callable, args, kwargs))
def get_result (self, * args, ** kwargs):
"" "
Get results queue
"" "
return self.resultqueue.get (* args, ** kwargs)
def add_result (self, result):
self.resultqueue.put (result)
A thread pool is defined above, and its initialization function __init__ () defines some properties that hold related data, which is common in the definition of some Python's internal modules, and it's good to look at the source code, and learn about the programming habits and programming ideas of the great God.
Another point to note is that the queues in the queue module can not only hold data (strings, values, lists, dictionaries, etc.), but also store functions (i.e. tasks), in the code above, callable is a function, and when a function is added to the queue with put (), put () The parameters that are accepted are the function object and the relevant parameters of the function, and if a whole, there is self.workqueue.put ((Callable,args,kwargs) in the code above. Similarly, when the data is fetched from the queue that holds the function, it returns a function object that includes its parameters, and is interested in printing out the Callable,args,kwargs in run () in the code above. If you don't know about the queue module, refer to my previous post and click here to read.
Here's a quick example.
import urllib2, datetime
def open_url (url):
try:
res = urllib2.urlopen (url) .getcode ()
except urllib2.HTTPError, e:
res = e.code
#print res
res = str (res)
with open (‘/ home / liulonghua / untitled document’, 'wr ’) as f:
f.write (res)
return res
if __name__ == "__main__":
urls = [
‘Http://www.python.org’,
‘Http://www.python.org/about/’,
‘Http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html’,
‘Http://www.python.org/doc/’,
‘Http://www.python.org/download/’,
‘Http://www.python.org/getit/’,
‘Http://www.python.org/community/’,
‘Https://wiki.python.org/moin/’,
‘Http://planet.python.org/’,
‘Https://wiki.python.org/moin/LocalUserGroups’,
‘Http://www.python.org/psf/’,
‘Http://docs.python.org/devguide/’,
‘Http://www.python.org/community/awards/’
]
t1 = datetime.datetime.now ()
w = WorkerManger (2)
for url in urls:
w.add_job (open_url, url)
w.start ()
w.wait_for_complete ()
t2 = datetime.datetime.now ()
print t2-t1
The final results are as follows:
650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M01/8B/0F/wKiom1hC0TDhmzEvAAApKwBAgLg044.png-wh_500x0-wm_3 -wmp_4-s_2446526079.png "title=" Selection _009.png "alt=" Wkiom1hc0tdhmzevaaapkwbaglg044.png-wh_50 "/>
What if you change the above code to use multithreading instead of a thread pool?
The code is as follows:
if __name__ == "__main__":
urls = [
‘http://www.python.org‘,
‘http://www.python.org/about/‘,
‘http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html‘,
‘http://www.python.org/doc/‘,
‘http://www.python.org/download/‘,
‘http://www.python.org/getit/‘,
‘http://www.python.org/community/‘,
‘https://wiki.python.org/moin/‘,
‘http://planet.python.org/‘,
‘https://wiki.python.org/moin/LocalUserGroups‘,
‘http://www.python.org/psf/‘,
‘http://docs.python.org/devguide/‘,
‘http://www.python.org/community/awards/‘
]
t1 = datetime.datetime.now()
for url in urls:
t = threading.Thread(target=open_url,args=(url,))
t.start()
t.join()
t2 = datetime.datetime.now()
print t2-t1
The results of the operation are as follows:
650) this.width=650; "Src=" Http://s1.51cto.com/wyfs02/M00/8B/0F/wKiom1hC0ZmRlXASAAAoAdUU_os511.png-wh_500x0-wm_3 -wmp_4-s_4293542304.png "title=" Selection _010.png "alt=" Wkiom1hc0zmrlxasaaaoaduu_os511.png-wh_50 "/>
The operational efficiency of the difference is still very large, interested can try.
(ii) Use of off-the-shelf thread pool modules
Download and install is also very simple, with the PIP tool
sudo pip install ThreadPool
Note: Here to mention a little, I fell into this hole, fortunately, it didn't take much time to solve. Because my computer has python2.7.12,python3.5, there is a PyPy5.4.1, the above instructions unexpectedly ThreadPool package installed in the PyPy directory, so in python2.7.12, I import ThreadPool, it has been an error, if you have multiple Python version of the system, and do not use VIRTUALENVS virtual environment tools, it is easy to cause this confusion, although I installed the Virtualenvs, but rarely used on their own computer, the solution is:
sudo python-m pip install ThreadPool
To differentiate PyPy, similarly if the third party package is installed in the pypy environment, with sudo pypy-m pip install PackageName, this in the previous blog post also introduced, interested can point this
The main classes and methods of the module are:
1.threadpool. ThreadPool: Thread pool class, primarily used to dispatch task requests and collect run results. The main methods are:
(1) __init__ (self,number_workers,q_size,resq_size=0,poll_timeout=5):
The thread pool is established and the corresponding Num_workers threads are started, q_size represents the size of the task request queue, and resq_size represents the size of the queue that holds the run results.
(2) Createworkers (self,num_workers,poll_timeout=5):
Join threads to thread pool with num_workers number corresponding
(3) Dismissworkers (Self,num_workers,do_join=false):
Tells the num_workers number of worker threads to exit after the current task has finished executing
(4) Joinalldismissworkers (self):
Execute thread.join on the thread that is set to exit
(5) Putrequest (Self,request,block=true,timeout=none):
Join a task request to a work queue
(6) Pool (self,block=false)
Processes new requests in the task queue. That is, the callback and the error callback in the result of invoking each thread of the loop. However, when the request queue is empty, a noresultpending exception is thrown to indicate that all results are processed. This feature is not suitable for relying on thread execution results to continue to join the request queue.
(7) Wait (self)
Wait for the result to be executed until all tasks are completed. When all execution results are returned, threads inside the thread pool are not destroyed, but are waiting for new tasks. Therefore, after wait (), you can still call Pool.putrequest () to add a task to it.
2. ThreadPool. Workerthread: The worker thread that handles the task, mainly the run () method and the dismiss () method.
3.threadpool. Workrequest: Task request class, containing work request classes with specific execution methods
__init__ (Self,callable,args=none,kwds=none,requestid=none,callback=none,exc_callback=none)
Create a work request.
4.makeRequests (callable_,args_list,callback=none,exc_callback=_handle_thread_exception):
The main function is used to create a series of work requests with the same execution function but with different parameters.
With the above custom thread pool mode basis, this module is not difficult to understand, interested can go to see the module source code. Its use steps are generally as follows:
(1) Introduction of ThreadPool Module
(2) Defining thread functions
(3) Create thread pool ThreadPool. ThreadPool ()
(4) Create a task that requires a thread pool to process threadpool.makerequests ()
(5) Put multiple tasks created into the thread pool, threadpool.putrequest
(6) Wait until all tasks are processed theadpool.pool ()
The above example is modified with the thread pool module, the code is as follows:
import threadpool
if __name__ == "__main__":
urls = [
‘http://www.python.org‘,
‘http://www.python.org/about/‘,
‘http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html‘,
‘http://www.python.org/doc/‘,
‘http://www.python.org/download/‘,
‘http://www.python.org/getit/‘,
‘http://www.python.org/community/‘,
‘https://wiki.python.org/moin/‘,
‘http://planet.python.org/‘,
‘https://wiki.python.org/moin/LocalUserGroups‘,
‘http://www.python.org/psf/‘,
‘http://docs.python.org/devguide/‘,
‘http://www.python.org/community/awards/‘
]
t1 = datetime.datetime.now()
pool = threadpool.ThreadPool(2)
requests = threadpool.makeRequests(open_url,urls)
[pool.putRequest(req) for req in requests]
pool.wait()
t2 = datetime.datetime.now()
print t2-t1
The results of the implementation are as follows:
650) this.width=650; "Src=" Http://s4.51cto.com/wyfs02/M02/8B/11/wKiom1hC5pyDhoP-AAAuGNHlbfk494.png-wh_500x0-wm_3 -wmp_4-s_636847557.png "title=" Selection _011.png "alt=" Wkiom1hc5pydhop-aaaugnhlbfk494.png-wh_50 "/>
Other methods of the module, interested can be self-realized under.
(3) Multiprocessing.dummy perform multi-threaded tasks
The difference between the Multiprocessing.dummy module and the multiprocessing module is that the dummy module is multi-threaded, and multiprocessing is multi-process, and the API is generic.
Sometimes you see someone use Dummy,from. Multiprocessing.dummy Import Pool as ThreadPool as a thread pool. Its properties and methods can refer to the process pool. The above example can be changed in this way the code is as follows:
from multiprocessing.dummy import Pool as ThreadPool
if __name__ == "__main__":
urls = [
‘http://www.python.org‘,
‘http://www.python.org/about/‘,
‘http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html‘,
‘http://www.python.org/doc/‘,
‘http://www.python.org/download/‘,
‘http://www.python.org/getit/‘,
‘http://www.python.org/community/‘,
‘https://wiki.python.org/moin/‘,
‘http://planet.python.org/‘,
‘https://wiki.python.org/moin/LocalUserGroups‘,
‘http://www.python.org/psf/‘,
‘http://docs.python.org/devguide/‘,
‘http://www.python.org/community/awards/‘
]
t1 = datetime.datetime.now()
pool =ThreadPool(2)
pool.map(open_url,urls)
pool.close()
pool.join()
t2 = datetime.datetime.now()
print t2-t1
The results of the operation are as follows:
650) this.width=650; "Src=" Http://s3.51cto.com/wyfs02/M02/8B/0D/wKioL1hC6-fjdzllAAAniQpjKuA876.png-wh_500x0-wm_3 -wmp_4-s_1776369302.png "title=" Selection _012.png "alt=" Wkiol1hc6-fjdzllaaaniqpjkua876.png-wh_50 "/>
I think the above three methods of the main idea is similar, or relatively good understanding, I hope you have help, not the place to welcome criticism!
Python: Thread, Process, and coprocessor (7)--Thread pool