標籤:img adp isp .sh 線程池 url done ade imp
代碼
在編寫爬蟲時,效能的消耗主要在IO請求中,當單進程單線程模式下請求URL時必然會引起等待,從而使得請求整體變慢。
import requestsfrom concurrent.futures import ThreadPoolExecutor #引入線程池模組def asyns_url(url): try: response = requests.get(url) except Exception as e: print(‘異常結果‘,response.url,response.content) print(‘擷取結果‘, response.url, response.content)url_list ={ ‘http://www.baidu.com‘, ‘http://www/google.com‘, ‘http://dig.chouti.com‘, ‘http://www.bing.com‘}pool =ThreadPoolExecutor(5)for url in url_list: print(‘開始請求‘,url) pool.submit(asyns_url,url)pool.shutdown(wait=True)#終止線程
from concurrent.futures import ProcessPoolExecutorimport requestsdef fetch_async(url): response = requests.get(url) return responseurl_list = [‘http://www.github.com‘, ‘http://www.bing.com‘]pool = ProcessPoolExecutor(5)for url in url_list: pool.submit(fetch_async, url)pool.shutdown(wait=True)3.多進程執行
多進程
from concurrent.futures import ProcessPoolExecutorimport requestsdef fetch_async(url): response = requests.get(url) return responsedef callback(future): print(future.result())url_list = [‘http://www.github.com‘, ‘http://www.bing.com‘]pool = ProcessPoolExecutor(5)for url in url_list: v = pool.submit(fetch_async, url) v.add_done_callback(callback)pool.shutdown(wait=True)3.多進程+回呼函數執行
多進程+回呼函數
Python進程池和線程池