In this paper, we use the crawler example to illustrate the comparison of the execution efficiency between multiple threads, multi-processes and co-threads.
Suppose we want to download images online now, an easy way is to use Requests+beautifulsoup. Note: All examples in this article use python3.5)
Single Thread
Example 1:get_photos.py
ImportOSImport TimeImportUUIDImportRequests fromBs4ImportBeautifulSoupdefOut_wrapper (func):#Simple adorner for recording program execution time definner_wrapper (): Start_time=time.time () func () Stop_time=time.time ()Print('used time {}'. Format (stop_time-start_time)) returnInner_wrapperdefSave_flag (img, filename):#Save PicturePath = Os.path.join ('Down_photos', filename) with open (path,'WB') as Fp:fp.write (IMG)defDownload_one (URL):#Download a pictureImage =requests.get (URL) save_flag (image.content, str (UUID.UUID4 ()))defUser_conf ():#returns the URL of 30 imagesURL ='https://unsplash.com/'ret=requests.get (URL) soup= BeautifulSoup (Ret.text,"lxml") ZZR= Soup.find_all ('img') ret=[] Num=0 forIteminchZZR:ifItem.get ("src"). EndsWith (' the') andNum < 30: Num+ = 1ret.append (Item.get ("src")) returnRet@out_wrapperdefDownload_many (): Zzr=user_conf () forIteminchZzr:download_one (item)if __name__=='__main__': Download_many ()
Example 1 is a sequential download, the average time to download 30 images is around 60s (results vary depending on the experimental environment).
This code can be used but not efficient, how to improve efficiency?
For reference, there are three ways to do this: multi-process, multi-threaded, and co-threading. Let us explain:
We all know that the Gil exists in Python (mainly CPython), but the Gil does not affect IO-intensive tasks, so for IO-intensive tasks, multithreading is more suitable (threads can open 100, 1000 and the number of simultaneous runs of the process is limited by the number of CPU cores, it is useless to open more)
However, this does not prevent us from experimenting with multiple processes.
Multi-process
Example 2
fromMultiprocessingImportProcess fromGet_photosImportOut_wrapper, Download_one, User_conf@out_wrapperdefDownload_many (): Zzr=user_conf () task_list= [] forIteminchZzr:t= Process (Target=download_one, args=(item,)) T.start () task_list.append (t) [T.join () forTinchTask_list]#wait for the process to complete (in order to record the time)if __name__=='__main__': Download_many ()
This example reuses part of the code for example 1, and we only need to focus on this part of using multiple processes.
The author tested 3 times (the machine used is a dual-core hyper-thread, that is, only 4 download tasks in progress), the output is: 19.5s, 17.4s and 18.6s. The speed boost is not a lot and proves that multi-process is not suitable for IO-intensive tasks.
There is also a way to use multiple processes, which is the processpoolexecutor in the built-in module futures.
Example 3
From concurrent import futuresfrom get_photos import Out_wrapper, Download_one, User_conf@out_wrapperdef Download_many ( ): ZZR = user_conf () with futures. Processpoolexecutor (Len (ZZR)) as executor: res = Executor.map (Download_one, ZZR) return len (List (res)) if __ name__ = = ' __main__ ': download_many ()
Using the Processpoolexecutor code is a lot simpler, and Executor.map is similar to the map usage in the standard library. The time-consuming is similar to Example 2. Multi-process is here, the following to experience the multithreading.
Multithreading
Example 4
ImportThreading fromGet_photosImportOut_wrapper, Download_one, User_conf@out_wrapperdefDownload_many (): Zzr=user_conf () task_list= [] forIteminchZzr:t= Threading. Thread (Target=download_one, args=(item,)) T.start () task_list.append (t) [T.join () forTinchTask_list]if __name__=='__main__': Download_many ()
The syntax of threading and multiprocessing is basically the same, but the speed is about 9s, and more processes are 1 time times higher.
The following example 5 and example 6 use the built-in module futures, respectively. Map and submit in Threadpoolexecutor, as_completed
Example 5
from concurrent import futures from get_photos Import Out_wrapper, Download_one, user_conf@out_wrapper def Download_many (): ZZR = User_conf () with futures. Threadpoolexecutor (Len (ZZR)) as Executor:res = Executor.map (Download_one, ZZR) return Len (list (res)) if __name__ = = " Span style= "COLOR: #800000" >__main__ " : Download_many ()
Example 6:
fromConcurrentImportFutures fromGet_photosImportOut_wrapper, Download_one, User_conf@out_wrapperdefDownload_many (): Zzr=user_conf () with futures. Threadpoolexecutor (Len (ZZR)) as Executor:to_do= [Executor.submit (Download_one, item) forIteminchZZR] ret= [Future.result () forFutureinchfutures.as_completed (TO_DO)]returnretif __name__=='__main__': Download_many ()
Executor.map is easier to use because it is similar to the built-in map usage, and it has a feature: the order in which the results are returned is the same as the order in which the calls begin. However, it is usually preferable to obtain the results, regardless of the order in which they are submitted.
To do this, use Executor.submit and futures.as_completed together.
Finally, the gevent and the Asyncio are presented here respectively.
Gevent
Example 7
fromGeventImportMonkeymonkey.patch_all ()Importgevent fromGet_photosImportOut_wrapper, Download_one, User_conf@out_wrapperdefDownload_many (): Zzr=user_conf () Jobs= [Gevent.spawn (Download_one, item) forIteminchZZR] Gevent.joinall (jobs)if __name__=='__main__': Download_many ()
Asyncio
Example 8
ImportUUIDImportAsyncioImportaiohttp fromGet_photosImportOut_wrapper, user_conf, Save_flagasyncdefdownload_one (URL): Async with Aiohttp. Clientsession () as Session:async with Session.get (URL) as Resp:save_flag (await Resp.read (), str (uuid.u Uid4 ())) @out_wrapperdefDownload_many (): URLs=user_conf () loop=asyncio.get_event_loop () To_do= [Download_one (URL) forUrlinchURLs] Wait_coro=asyncio.wait (TO_DO) Res, _=Loop.run_until_complete (Wait_coro) loop.close ()returnLen (res)if __name__=='__main__': Download_many ()
The duration of the process is similar to that of multi-line threads, except that the co-process is single-threaded. The specific principle is confined to the space here will not repeat.
But we have to say that Asyncio,asyncio is a Python3.4 added to the standard library, adding the async and await keywords to it in 3.5. Perhaps you can learn a little bit about the multithreaded multi-process example above, but to understand Asyncio you have to pay more time and energy.
In addition, it is difficult to use thread-writing programs because the scheduler can break threads at any time. Locks must be retained to protect the program from being interrupted during execution, preventing data from being in an invalid state.
The co-process is fully protected by default, and we have to explicitly output it to allow the rest of the program to run. To the coprocessor, without having to retain the lock and synchronize the operations between multiple threads, the coprocessor itself synchronizes because only one of the processes is running at any time. To surrender control, you can use yield or yield from (await) to return control to the scheduler.
Summarize
This article mainly introduces the basic usage of the concurrent related modules in Python. This is not covered by the concepts of processes, threads, Asyncio, blocking IO, non-blocking IO, synchronous io, asynchronous Io, event-driven, and so on. Everyone interested in the words can be Google or Baidu, you can also leave a message below, we discuss together.
Python Learning Exchange Group: 125240963
Unknown Little Demon
Reprint to: 80681775
Concurrent Experience: 8 ways to grab a python image