Performance Comparison of multiprocessing, threading and gevent in python3 ---- process pool, thread pool, and process Pool performance comparison, python3
At present, computer programs generally encounter two types of I/O: Hard Disk I/O and network I/O. I analyzed the process, thread, and coroutine efficiency in python3 for network I/O scenarios. The process uses the multiprocessing. Pool process Pool, the thread is the process Pool encapsulated by itself, and the coroutine uses the gevent library. Use the urlllib. request provided by python3 to compare it with the open-source requests. The Code is as follows:
import urllib.requestimport requestsimport timeimport multiprocessingimport threadingimport queuedef startTimer(): return time.time()def ticT(startTime): useTime = time.time() - startTime return round(useTime, 3)#def tic(startTime, name):# useTime = time.time() - startTime# print('[%s] use time: %1.3f' % (name, useTime))def download_urllib(url): req = urllib.request.Request(url, headers={'user-agent': 'Mozilla/5.0'}) res = urllib.request.urlopen(req) data = res.read() try: data = data.decode('gbk') except UnicodeDecodeError: data = data.decode('utf8', 'ignore') return res.status, datadef download_requests(url): req = requests.get(url, headers={'user-agent': 'Mozilla/5.0'}) return req.status_code, req.textclass threadPoolManager:def __init__(self,urls, workNum=10000,threadNum=20):self.workQueue=queue.Queue()self.threadPool=[]self.__initWorkQueue(urls)self.__initThreadPool(threadNum)def __initWorkQueue(self,urls):for i in urls:self.workQueue.put((download_requests,i))def __initThreadPool(self,threadNum):for i in range(threadNum):self.threadPool.append(work(self.workQueue))def waitAllComplete(self):for i in self.threadPool:if i.isAlive():i.join()class work(threading.Thread):def __init__(self,workQueue):threading.Thread.__init__(self)self.workQueue=workQueueself.start()def run(self):while True:if self.workQueue.qsize():do,args=self.workQueue.get(block=False)do(args)self.workQueue.task_done()else:breakurls = ['http://www.ustchacker.com'] * 10urllibL = []requestsL = []multiPool = []threadPool = []N = 20PoolNum = 100for i in range(N): print('start %d try' % i) urllibT = startTimer() jobs = [download_urllib(url) for url in urls] #for status, data in jobs: # print(status, data[:10]) #tic(urllibT, 'urllib.request') urllibL.append(ticT(urllibT)) print('1') requestsT = startTimer() jobs = [download_requests(url) for url in urls] #for status, data in jobs: # print(status, data[:10]) #tic(requestsT, 'requests') requestsL.append(ticT(requestsT)) print('2') requestsT = startTimer() pool = multiprocessing.Pool(PoolNum) data = pool.map(download_requests, urls) pool.close() pool.join() multiPool.append(ticT(requestsT)) print('3') requestsT = startTimer() pool = threadPoolManager(urls, threadNum=PoolNum) pool.waitAllComplete() threadPool.append(ticT(requestsT)) print('4')import matplotlib.pyplot as pltx = list(range(1, N+1))plt.plot(x, urllibL, label='urllib')plt.plot(x, requestsL, label='requests')plt.plot(x, multiPool, label='requests MultiPool')plt.plot(x, threadPool, label='requests threadPool')plt.xlabel('test number')plt.ylabel('time(s)')plt.legend()plt.show()
The running result is as follows:
It can be seen that the urllib of python3 comes. the request efficiency is not as good as the open-source requests. The efficiency of the multiprocessing process pool is significantly improved, but it is still lower than the self-encapsulated thread pool, one reason is that the overhead of the creation and scheduling processes is higher than that of the Creation thread (I included the creation cost in the test program ).
The following is the gevent test code:
import urllib.requestimport requestsimport timeimport gevent.poolimport gevent.monkeygevent.monkey.patch_all()def startTimer(): return time.time()def ticT(startTime): useTime = time.time() - startTime return round(useTime, 3)#def tic(startTime, name):# useTime = time.time() - startTime# print('[%s] use time: %1.3f' % (name, useTime))def download_urllib(url): req = urllib.request.Request(url, headers={'user-agent': 'Mozilla/5.0'}) res = urllib.request.urlopen(req) data = res.read() try: data = data.decode('gbk') except UnicodeDecodeError: data = data.decode('utf8', 'ignore') return res.status, datadef download_requests(url): req = requests.get(url, headers={'user-agent': 'Mozilla/5.0'}) return req.status_code, req.texturls = ['http://www.ustchacker.com'] * 10urllibL = []requestsL = []reqPool = []reqSpawn = []N = 20PoolNum = 100for i in range(N): print('start %d try' % i) urllibT = startTimer() jobs = [download_urllib(url) for url in urls] #for status, data in jobs: # print(status, data[:10]) #tic(urllibT, 'urllib.request') urllibL.append(ticT(urllibT)) print('1') requestsT = startTimer() jobs = [download_requests(url) for url in urls] #for status, data in jobs: # print(status, data[:10]) #tic(requestsT, 'requests') requestsL.append(ticT(requestsT)) print('2') requestsT = startTimer() pool = gevent.pool.Pool(PoolNum) data = pool.map(download_requests, urls) #for status, text in data: # print(status, text[:10]) #tic(requestsT, 'requests with gevent.pool') reqPool.append(ticT(requestsT)) print('3') requestsT = startTimer() jobs = [gevent.spawn(download_requests, url) for url in urls] gevent.joinall(jobs) #for i in jobs: # print(i.value[0], i.value[1][:10]) #tic(requestsT, 'requests with gevent.spawn') reqSpawn.append(ticT(requestsT)) print('4') import matplotlib.pyplot as pltx = list(range(1, N+1))plt.plot(x, urllibL, label='urllib')plt.plot(x, requestsL, label='requests')plt.plot(x, reqPool, label='requests geventPool')plt.plot(x, reqSpawn, label='requests Spawn')plt.xlabel('test number')plt.ylabel('time(s)')plt.legend()plt.show()
The running result is as follows:
As you can see, gevent can greatly improve the performance of I/O-intensive tasks, because the coroutine creation and scheduling overhead are much lower than the thread, therefore, we can see that the performance gap is not big whether the Spawn mode or the Pool Mode of gevent is used.
Because the monkey patch is required in gevent, gevent performance will be improved, but the multiprocessing operation will be affected. To use it at the same time, the following code is required:
gevent.monkey.patch_all(thread=False, socket=False, select=False)
However, the advantages of gevent cannot be fully utilized, so the multiprocessing Pool, threading Pool, and gevent Pool cannot be compared in one program. However, by comparing the two figures, we can conclude that the thread pool and gevent have the best performance, followed by the process pool. Come to the conclusion that the performance of the requests library is better than that of the urllib. request library :-)
Reprinted Please note: