There are many applications of threads and processes, mainly dealing with concurrency and multitasking. However, when there are too many threads and processes open, the overhead of the system can cause poor performance or even crash. At this point, you want a way to specify a policy that can only execute a specified number of threads and processes. In particular, there may be situations where there are too many threads or processes that you do not know how many threads or processes to open. As a result, the thread pool and process pool appear. Python3 later added the Concurrent.futures module, providing an advanced interface for asynchronous execution.
Thread pool
concurrent.futures.ThreadPoolExecutor(max_workers=None, thread_name_prefix=‘‘): 线程池,提供能异步地执行任务的线程。
参数max_workers为最大的能提供线程的个数,默认为CPU的核数乘5,如果CPU为四核那么能开启的最大线程数为20。
参数thread_name_prefix为线程名前缀,为了方便控制线程和调试线程。
ThreadPoolExecutor下面有submit,map,shutdown方法:
submit(fn, *args, **kwargs)方法将返回一个futurn对象,代表将要执行或未完成的任务的结果。
map(func, *iterables, timeout=None, chunksize=1)将返回一个迭代器iter,没弄next方法执行iter一次,将并发max_workers个线程。
shutdown(wait=True)将释放完成任务的线程池所占的所有资源,参数wait如果为True,则等待未完成的任务。如果使用with,则不用显示地调用。
注意: shutdown方法的wait不管是True或是False,解释器都会把剩余的任务执行完。区别就是一个是等待(阻塞),一个是不等待。
Future objects
Future对象为Executor.submit()执行后的结果,代表将要执行或未完成的任务的结果。注意,不用手动调用concurrent.futures.Future生成Future对象。 它有以下多种方法:
cancel(): 试图取消任务。如果当前任务正在被执行而且不能取消,返回False,否则此任务被取消并返回True。cancelled(): 如果任务成功地取消,返回True
running(): 如果当前任务正在被执行而且不能取消,返回True
done(): 如果任务被完成或成功地被取消则返回True
result(timeout=None): 返回任务的结果,如果任务未完成则等待timeout秒。
exception(timeout=None):在timeout秒内返回任务的异常
add_done_callback(fn): 添加回调函数。并且futurnd对象最为回调函数的唯一参数,无论任务被取消或完成。
Importrequests, time fromBs4ImportBeautifulSoup fromConcurrent.futuresImportThreadpoolexecutor, Processpoolexecutor, As_completedurls= [ 'http://www.baidu.com', 'http://www.bing.com', 'http://wwww.sougou.com', 'http://www.soso.com']defget_page_title (URL, timeout):" "get the title of the page" "HTML= Requests.get (#send a GET request using requestsUrl=URL, timeout=Timeout, headers= { 'user-agent':'mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) gecko/20100101 firefox/51.0' } )#print (Html.text)Soup = BeautifulSoup (Html.text,"Html.parser")#Parsing Documentstitle = Soup.find ('title')#get the title of the page returntitle.textwith threadpoolexecutor (max_workers=4) as Excutor:#use with to get a thread pool with a maximum number of threads of 4Start =time.time () Future_and_url= {excutor.submit (get_page_title, URL, ten): URL forUrlinchURLS}#Submit a Task forFutureinchAs_completed (Future_and_url):#use as_completed to return an iterator for a completed taskURL =Future_and_url[future]Try: Data= Future.result ()#get the result of the task exceptException as E:Print("Has occured exception:", E)Else: Cost_time= Time.time ()-StartPrint("got title:%s"%data,'spend%ss'%cost_time)
The output is:
-| Popular talk about spend 1-Global search, have asked Bing (Bing) spend 10.456863164901733s
Process Pool
进程池同样也提供能异步执行任务的进程,不同的是它能有效地回避全局解释锁的限制。一个进程会开辟独立的空间,所以进程运行着自己的解释器,互不影响。
concurrent.futures.ProcessPoolExecutor(max_workers=None): 进程池,max_workers与线程不同的是默认为CPU的核数。
先用线程试试看,在比较。
defis_perfect_number (number):" "Judging whether it's a perfect number" "sum=0 forIinchRange (1, number):ifnumber%i = =0:sum+=Iifsum = =Number :returnTruereturnFalsedeffind_perfect_number_t (number):" "using threads to find all the perfect numbers in this number range" "Perfect_number=[] start_time=Time.time () with Threadpoolexecutor () as Executor:future_dict= {Executor.submit (is_perfect_number, i): I forIinchRange (1, Number)} forFutureinchas_completed (future_dict):ifFuture.result (): Perfect_number.append (Future_dict[future])Print('The perfect number of %s is:'%Number , Perfect_number)Print('Has spend%ss'% (Time.time ()-start_time))
Perform:
find_perfect_number_t (25000)
The output is:
is: [6, 496, 8128134.63016271591187s
Now we change the process:
deffind_perfect_number_p (number):" "use the process to find all the perfect numbers in this number range." "Perfect_number=[] start_time=Time.time () with Processpoolexecutor () as Executor:future_dict= {Executor.submit (is_perfect_number, i): I forIinchRange (1, Number)} forFutureinchas_completed (future_dict):ifFuture.result (): Perfect_number.append (Future_dict[future])Print('The perfect number of %s is:'%number, perf
Re-execution:
Find_perfect_number_p (25000)
The output is:
is: [6, 496, 812845.46505379676819s
This is the CPU load graph that runs the thread code:
This is the load graph for the process:
Conclusion:
The example above shows a good illustration of the difference between threads and processes. My CPU is four cores, Python's multithreading only uses CPU one core, CPU utilization rate is only 35%. The multi-process takes full advantage of all CPUs, with a usage rate of 100%. However, the creation and destruction of processes consumes much more resources than threads, so the use of threads is actually faster than the process in the case of a small amount of computation. Multiple processes in Python are suitable for solving large computational problems and taking full advantage of the CPU.
To turn on the thread pool and process pool