Performance Comparison of multiprocessing, threading and gevent in python3 ---- process pool, thread pool, and process Pool performance comparison, python3

Last Update:2014-11-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

At present, computer programs generally encounter two types of I/O: Hard Disk I/O and network I/O. I analyzed the process, thread, and coroutine efficiency in python3 for network I/O scenarios. The process uses the multiprocessing. Pool process Pool, the thread is the process Pool encapsulated by itself, and the coroutine uses the gevent library. Use the urlllib. request provided by python3 to compare it with the open-source requests. The Code is as follows:

import urllib.requestimport requestsimport timeimport multiprocessingimport threadingimport queuedef startTimer():    return time.time()def ticT(startTime):    useTime = time.time() - startTime    return round(useTime, 3)#def tic(startTime, name):#    useTime = time.time() - startTime#    print('[%s] use time: %1.3f' % (name, useTime))def download_urllib(url):    req = urllib.request.Request(url,            headers={'user-agent': 'Mozilla/5.0'})    res = urllib.request.urlopen(req)    data = res.read()    try:        data = data.decode('gbk')    except UnicodeDecodeError:        data = data.decode('utf8', 'ignore')    return res.status, datadef download_requests(url):    req = requests.get(url,            headers={'user-agent': 'Mozilla/5.0'})    return req.status_code, req.textclass threadPoolManager:def __init__(self,urls, workNum=10000,threadNum=20):self.workQueue=queue.Queue()self.threadPool=[]self.__initWorkQueue(urls)self.__initThreadPool(threadNum)def __initWorkQueue(self,urls):for i in urls:self.workQueue.put((download_requests,i))def __initThreadPool(self,threadNum):for i in range(threadNum):self.threadPool.append(work(self.workQueue))def waitAllComplete(self):for i in self.threadPool:if i.isAlive():i.join()class work(threading.Thread):def __init__(self,workQueue):threading.Thread.__init__(self)self.workQueue=workQueueself.start()def run(self):while True:if self.workQueue.qsize():do,args=self.workQueue.get(block=False)do(args)self.workQueue.task_done()else:breakurls = ['http://www.ustchacker.com'] * 10urllibL = []requestsL = []multiPool = []threadPool = []N = 20PoolNum = 100for i in range(N):    print('start %d try' % i)    urllibT = startTimer()    jobs = [download_urllib(url) for url in urls]    #for status, data in jobs:    #    print(status, data[:10])    #tic(urllibT, 'urllib.request')    urllibL.append(ticT(urllibT))    print('1')        requestsT = startTimer()    jobs = [download_requests(url) for url in urls]    #for status, data in jobs:    #    print(status, data[:10])    #tic(requestsT, 'requests')    requestsL.append(ticT(requestsT))    print('2')        requestsT = startTimer()    pool = multiprocessing.Pool(PoolNum)    data = pool.map(download_requests, urls)    pool.close()    pool.join()    multiPool.append(ticT(requestsT))    print('3')    requestsT = startTimer()    pool = threadPoolManager(urls, threadNum=PoolNum)    pool.waitAllComplete()    threadPool.append(ticT(requestsT))    print('4')import matplotlib.pyplot as pltx = list(range(1, N+1))plt.plot(x, urllibL, label='urllib')plt.plot(x, requestsL, label='requests')plt.plot(x, multiPool, label='requests MultiPool')plt.plot(x, threadPool, label='requests threadPool')plt.xlabel('test number')plt.ylabel('time(s)')plt.legend()plt.show()

The running result is as follows:

It can be seen that the urllib of python3 comes. the request efficiency is not as good as the open-source requests. The efficiency of the multiprocessing process pool is significantly improved, but it is still lower than the self-encapsulated thread pool, one reason is that the overhead of the creation and scheduling processes is higher than that of the Creation thread (I included the creation cost in the test program ).

The following is the gevent test code:

import urllib.requestimport requestsimport timeimport gevent.poolimport gevent.monkeygevent.monkey.patch_all()def startTimer():    return time.time()def ticT(startTime):    useTime = time.time() - startTime    return round(useTime, 3)#def tic(startTime, name):#    useTime = time.time() - startTime#    print('[%s] use time: %1.3f' % (name, useTime))def download_urllib(url):    req = urllib.request.Request(url,            headers={'user-agent': 'Mozilla/5.0'})    res = urllib.request.urlopen(req)    data = res.read()    try:        data = data.decode('gbk')    except UnicodeDecodeError:        data = data.decode('utf8', 'ignore')    return res.status, datadef download_requests(url):    req = requests.get(url,            headers={'user-agent': 'Mozilla/5.0'})    return req.status_code, req.texturls = ['http://www.ustchacker.com'] * 10urllibL = []requestsL = []reqPool = []reqSpawn = []N = 20PoolNum = 100for i in range(N):    print('start %d try' % i)    urllibT = startTimer()    jobs = [download_urllib(url) for url in urls]    #for status, data in jobs:    #    print(status, data[:10])    #tic(urllibT, 'urllib.request')    urllibL.append(ticT(urllibT))    print('1')        requestsT = startTimer()    jobs = [download_requests(url) for url in urls]    #for status, data in jobs:    #    print(status, data[:10])    #tic(requestsT, 'requests')    requestsL.append(ticT(requestsT))    print('2')        requestsT = startTimer()    pool = gevent.pool.Pool(PoolNum)    data = pool.map(download_requests, urls)    #for status, text in data:    #    print(status, text[:10])    #tic(requestsT, 'requests with gevent.pool')    reqPool.append(ticT(requestsT))    print('3')        requestsT = startTimer()    jobs = [gevent.spawn(download_requests, url) for url in urls]    gevent.joinall(jobs)    #for i in jobs:    #    print(i.value[0], i.value[1][:10])    #tic(requestsT, 'requests with gevent.spawn')    reqSpawn.append(ticT(requestsT))    print('4')    import matplotlib.pyplot as pltx = list(range(1, N+1))plt.plot(x, urllibL, label='urllib')plt.plot(x, requestsL, label='requests')plt.plot(x, reqPool, label='requests geventPool')plt.plot(x, reqSpawn, label='requests Spawn')plt.xlabel('test number')plt.ylabel('time(s)')plt.legend()plt.show()

The running result is as follows:

As you can see, gevent can greatly improve the performance of I/O-intensive tasks, because the coroutine creation and scheduling overhead are much lower than the thread, therefore, we can see that the performance gap is not big whether the Spawn mode or the Pool Mode of gevent is used.

Because the monkey patch is required in gevent, gevent performance will be improved, but the multiprocessing operation will be affected. To use it at the same time, the following code is required:

gevent.monkey.patch_all(thread=False, socket=False, select=False)

However, the advantages of gevent cannot be fully utilized, so the multiprocessing Pool, threading Pool, and gevent Pool cannot be compared in one program. However, by comparing the two figures, we can conclude that the thread pool and gevent have the best performance, followed by the process pool. Come to the conclusion that the performance of the requests library is better than that of the urllib. request library :-)

Reprinted Please note:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Performance Comparison of multiprocessing, threading and gevent in python3 ---- process pool, thread pool, and process Pool performance comparison, python3

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Performance Comparison of multiprocessing, threading and gevent in python3 ---- process pool, thread pool, and process Pool performance comparison, python3

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support