Python Asynchronous IO---Easily manage 10k+ concurrent connections

Source: Internet
Author: User
Tags benchmark

Foreword asynchronous operation is a universal concept in computer software and hardware system, which is rooted in the obvious differences in the processing speed of various entities involved in collaboration. The majority of software development encountered in the CPU and IO speed mismatch, so asynchronous IO exists in various programming frameworks, such as browser, server, such as node. js. This paper mainly analyzes Python asynchronous Io. The Python 3.4 Standard library has a new module, Asyncio, to support asynchronous Io, although the current API status is provisional, which means that backward compatibility is not guaranteed and may even be removed from the standard library (very low probability). If the focus on PEP and Python-dev will find the module brewing for a long time, there may be a follow-up to the API and implementation of the adjustment, but undoubtedly asyncio is very practical and powerful, it is worth learning and delve into. The example Asyncio primarily deal with TCP/UDP socket communications, managing large numbers of connections without creating a large number of threads to improve system efficiency. An example of an official document is simply remodeled here to implement an HTTP long connection benchmark tool to diagnose the Web server's long connection processing capability. Feature Overview: Create 10 connections every 10 milliseconds until the number of target connections (such as 10k), and each connection will periodically send a head request to the server to maintain HTTP Keepavlie. The code is as follows:

Click ( here) to collapse or open

  1. Import Argparse
  2. Import Asyncio
  3. Import Functools
  4. Import logging
  5. Import Random
  6. Import Urllib.parse
  7. loop = Asyncio.get_event_loop ()
  8. @asyncio. coroutine
  9. Def print_http_headers (no, URL, keepalive):
  10. url = urllib.parse.urlsplit (URL)
  11. Wait_for = Functools.partial (Asyncio.wait_for, timeout=3, Loop=loop)
  12. query = (' HEAD {url.path} http/1.1\r\n '
  13. ' Host: {url.hostname}\r\n '
  14. ' \ r \ n '). Format (Url=url). Encode (' Utf-8 ')
  15. Rd, WR = yield from wait_for (Asyncio.open_connection (Url.hostname, 80))
  16. While True:
  17. Wr.write (query)
  18. While True:
  19. Line = yield from wait_for (Rd.readline ())
  20. If not line: # End of connection
  21. Wr.close ()
  22. Return No
  23. line = Line.decode (' Utf-8 '). Rstrip ()
  24. If not line: # End of header
  25. Break
  26. Logging.debug (' (%d) HTTP header>%s '% (no, line))
  27. Yield from Asyncio.sleep (Random.randint (1, KEEPALIVE//2))
  28. @asyncio. coroutine
  29. def do_requests (args):
  30. Conn_pool = set ()
  31. Waiter = Asyncio. Future ()
  32. def _on_complete (fut):
  33. Conn_pool.remove (FUT)
  34. EXC, res = Fut.exception (), Fut.result ()
  35. If exc is not None:
  36. Logging.info (' conn#{} exception '. Format (EXC))
  37. Else
  38. Logging.info (' conn#{} result '. Format (res))
  39. If not conn_pool:
  40. Waiter.set_result (' event loop is done ')
  41. For I in Range (args.connections):
  42. FUT = Asyncio.async (Print_http_headers (i, Args.url, args.keepalive))
  43. Fut.add_done_callback (_on_complete)
  44. Conn_pool.add (FUT)
  45. If I% 10 = = 0:
  46. Yield from asyncio.sleep (0.01)
  47. Logging.info (yield from waiter)
  48. def main ():
  49. Parser = Argparse. Argumentparser (description= ' asyncli ')
  50. Parser.add_argument (' url ', help= ' page address ')
  51. Parser.add_argument ('-C ', '--connections ', Type=int, Default=1,
  52. help= ' number of connections simultaneously ')
  53. Parser.add_argument ('-K ', '--keepalive ', Type=int, default=60,
  54. help= ' HTTP keepalive timeout ')
  55. args = Parser.parse_args ()
  56. Logging.basicconfig (level=logging.info, format= '% (asctime) s% (message) s ')
  57. Loop.run_until_complete (do_requests (args))
  58. Loop.close ()
  59. if __name__ = = ' __main__ ':
  60. Main ()

Test and Analysis Hardware: CPU 2.3GHZ/2 cores,ram 2GB Software: CentOS 6.5 (Kernel 2.6.32), Python 3.3 (pip install Asyncio), nginx 1.4.7 parameter settings: Ulimi T-n 10240;nginx worker connection number to 10240 start the Web server, just a worker process:
    1. # .. /sbin/nginx
    2. # PS Ax | grep nginx
    3. 2007? Ss 0:00 Nginx:master Process: /sbin/nginx
    4. 2008? S 0:00 Nginx:worker Process
Start the benchmark tool, initiate 10k connections, and the destination URL is the default test page for Nginx:
    1. $ python asyncli.py http://10.211.55.8/-C 10000
Nginx Log Statistics average number of requests per second:
    1. # tail-1000000 Access.log | awk ' {print $4} ' | Sort | uniq-c | awk ' {cnt+=1; sum+=$1} END {printf "avg =%d\n", sum/cnt} '
    2. AVG = 548
Top Partial output:
    1. VIRT RES SHR S%cpu%MEM time+ COMMAND
    2. 657m 115m 3860 R 60.2 6.2 4:30.02 python
    3. 54208 10m 848 R 7.0 0.6 0:30.79 Nginx
  Summary: 1. Python is simple and straightforward to implement. Less than 80 lines of code, only used in the standard library, logic intuitive, imagine the C + + standard library to implement these functions, circumnavigated "Life is too short, I use Python."  2. Python is inefficient to run. When the connection is established, the client and the service side of the data transmission logic is similar, look at the top output, Python CPU and RAM occupy the basic is 10 times times the nginx, meaning the efficiency difference 100 times times (CPU x RAM), the side illustrates the efficiency gap between Python and C. Although the contrast is some extreme, after all, nginx not only use C and for the Cpu/ram occupation did a depth optimization, but similar task efficiency difference of two orders of magnitude, unless it is a bug, the starting point of architecture design is different, Python first readable and easy to use and performance second, Nginx is a highly optimized Web server, the development of a module is more troublesome, to reuse its asynchronous framework, it is simply more difficult. The tradeoff between development efficiency and operational efficiency is always there.  3. Single-threaded asynchronous IO v.s. Multithreading synchronous IO. The above example is single-threaded asynchronous IO, in fact, do not write the demo will know that multithreading synchronous IO is much less efficient, one connection per thread? 10k threads, only the line stacks occupy 600+MB (64KB * 10000) memory, plus the thread context switch and Gil, basically is a nightmare.  ayncio Core Concepts   The following are four core concepts that need to be understood when learning Asyncio, see < reference >1 for more details. Event loop. The key to a single-threaded implementation of Asynchrony is the high-level event loop, which is executed synchronously. 2. Future. Asynchronous IO has a lot of asynchronous tasks, and each asynchronous task is controlled by a future. 3. Coroutine. The specific execution logic of each asynchronous task is represented by a coroutine. 4. Generator (yield & yield from). The extensive use in Asyncio is a grammatical detail that cannot be ignored.   Reference  1. asyncio–asynchronous I/O, event Loop, Coroutines and tasks, https://docs.python.org/3/ LIBRARY/ASYNCIO.HTML2. PEP 3156, asynchronous IO support Rebooted:the "Asyncio" Module, http://legacy.pytHon.org/dev/peps/pep-3156/3. PEP 380, syntax for delegating to a SUBGENERATOR,&NBSP;HTTP://LEGACY.PYTHON.ORG/DEV/PEPS/PEP-0380/4. PEP 342, coroutines via Enhanced GENERATORS,&NBSP;HTTP://LEGACY.PYTHON.ORG/DEV/PEPS/PEP-0342/5. PEP 255, simple GENERATORS,&NBSP;HTTP://LEGACY.PYTHON.ORG/DEV/PEPS/PEP-0255/6. Asyncio Source code, http://hg.python.org/cpython/file/3.4/lib/asyncio/

Python Asynchronous IO---Easily manage 10k+ concurrent connections

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.