Asynchronous tasks of Python Tornado framework and AsyncHTTPClient and asynchttpclient frameworks

Source: Internet
Author: User

Asynchronous tasks of Python Tornado framework and AsyncHTTPClient and asynchttpclient frameworks

High-performance server Tornado
Python has a wide variety of web frameworks. Just as glory belongs to Greece, greatness belongs to Rome. The elegant combination of Python and WSGI design makes the web framework interface uniform. WSGI combines applications with servers. Both Django and Flask can use gunicon to build and deploy applications.

Unlike django and flask, tornado can be either a wsgi application or a wsgi service. Of course, more considerations for tornado come from the network mode of single-process, single-thread asynchronous IO. High performance is often attractive, but many people may ask questions after using it. tornado claims high performance, but how can it be lost in actual use?

In fact, high performance is derived from Tornado Asynchronous Network IO Based on Epoll (kqueue for unix. Due to tornado's single-threaded mechanism, it is easy to write block code accidentally. Not only does it not improve performance, but it will lead to a sharp drop in performance. Therefore, it is necessary to explore the asynchronous use of tornado.

Tornado asynchronous usage
In short, Tornado Asynchronization includes two aspects: Asynchronous Server and asynchronous client. Regardless of the server and client, the specific asynchronous model can be divided into callback and coroutine ). There are no definite limits on specific application scenarios. A request service usually contains client asynchronous requests for other services.

Asynchronous Server Mode
The Asynchronous Server can be understood as a time-consuming task within a tornado request. Directly written in the business logic may block the entire service. Therefore, you can put this task in asynchronous processing. There are two asynchronous Methods: yield suspension function and thread pool. See a synchronization example:

Class SyncHandler (tornado. web. requestHandler): def get (self, * args, ** kwargs): # time-consuming code OS. system ("ping-c 2 www.google.com") self. finish ('It works ')

Test with AB:

ab -c 5 -n 5 http://127.0.0.1:5000/sync
Server Software:    TornadoServer/4.3Server Hostname:    127.0.0.1Server Port:      5000Document Path:     /syncDocument Length:    5 bytesConcurrency Level:   5Time taken for tests:  5.076 secondsComplete requests:   5Failed requests:    0Total transferred:   985 bytesHTML transferred:    25 bytesRequests per second:  0.99 [#/sec] (mean)Time per request:    5076.015 [ms] (mean)Time per request:    1015.203 [ms] (mean, across all concurrent requests)Transfer rate:     0.19 [Kbytes/sec] received

Qps is only a poor 0.99. Let's simply process one request per second.

Below is an asynchronous algorithm:

class AsyncHandler(tornado.web.RequestHandler):  @tornado.web.asynchronous  @tornado.gen.coroutine  def get(self, *args, **kwargs):    tornado.ioloop.IOLoop.instance().add_timeout(1, callback=functools.partial(self.ping, 'www.google.com'))    # do something others    self.finish('It works')  @tornado.gen.coroutine  def ping(self, url):    os.system("ping -c 2 {}".format(url))    return 'after'

Although timeout is selected for 1 second during asynchronous task execution, the main thread returns very quickly. The AB pressure test is as follows:

Document Path:     /asyncDocument Length:    5 bytesConcurrency Level:   5Time taken for tests:  0.009 secondsComplete requests:   5Failed requests:    0Total transferred:   985 bytesHTML transferred:    25 bytesRequests per second:  556.92 [#/sec] (mean)Time per request:    8.978 [ms] (mean)Time per request:    1.796 [ms] (mean, across all concurrent requests)Transfer rate:     107.14 [Kbytes/sec] received

In the above method, the I/o loop of tornado is used to place time-consuming tasks in the background for asynchronous computing, and the request can be followed by other calculations. However, after some time-consuming tasks are completed, we need the computing results. This method does not work. There must be a road before the driveway. You only need to switch An Asynchronous Method. The following uses coroutine to rewrite:

Class AsyncTaskHandler (tornado. web. requestHandler): @ tornado. web. asynchronous @ tornado. gen. coroutine def get (self, * args, ** kwargs): # yield result response = yield tornado. gen. task (self. ping, 'www.google.com ') print 'response', response self. finish ('hello') @ tornado. gen. coroutine def ping (self, url): OS. system ("ping-c 2 {}". format (url) return 'after'

We can see that asynchronous processing is in progress, and the result value is also returned.

Server Software:    TornadoServer/4.3Server Hostname:    127.0.0.1Server Port:      5000Document Path:     /async/taskDocument Length:    5 bytesConcurrency Level:   5Time taken for tests:  0.049 secondsComplete requests:   5Failed requests:    0Total transferred:   985 bytesHTML transferred:    25 bytesRequests per second:  101.39 [#/sec] (mean)Time per request:    49.314 [ms] (mean)Time per request:    9.863 [ms] (mean, across all concurrent requests)Transfer rate:     19.51 [Kbytes/sec] received

Qps improvement is quite obvious. Sometimes such coroutine processing may not be faster than synchronization. When the concurrency is small, the gap between IO itself is not big. Even the coroutine and synchronization performance are similar. For example, if you run 100 meters with bolt, you will surely lose to him, but if you run 2 meters with him, the deer will not be reached yet.

Yield suspends the function coroutine, although there is no block main thread, because it needs to process the return value, there is still time to wait until the response execution is suspended, relative to a single request. Another way to use Asynchronous and collaborative threads is to use the thread pool outside the main thread, and the thread pool depends on ures. Python2 requires additional installation.

Modify the thread pool to asynchronous processing:

from concurrent.futures import ThreadPoolExecutorclass FutureHandler(tornado.web.RequestHandler):  executor = ThreadPoolExecutor(10)  @tornado.web.asynchronous  @tornado.gen.coroutine  def get(self, *args, **kwargs):    url = 'www.google.com'    tornado.ioloop.IOLoop.instance().add_callback(functools.partial(self.ping, url))    self.finish('It works')  @tornado.concurrent.run_on_executor  def ping(self, url):    os.system("ping -c 2 {}".format(url))

Run the AB test again:

Document Path:     /futureDocument Length:    5 bytesConcurrency Level:   5Time taken for tests:  0.003 secondsComplete requests:   5Failed requests:    0Total transferred:   995 bytesHTML transferred:    25 bytesRequests per second:  1912.78 [#/sec] (mean)Time per request:    2.614 [ms] (mean)Time per request:    0.523 [ms] (mean, across all concurrent requests)Transfer rate:     371.72 [Kbytes/sec] received

Qps instantly reached 1912.78. At the same time, we can see that the server log is continuously outputting ping results.
It is also easy to return values. Next, switch to the usage interface. Use the with_timeout function under the gen module of tornado (this function must be available in tornado> 3.2 ).

class Executor(ThreadPoolExecutor):  _instance = None  def __new__(cls, *args, **kwargs):    if not getattr(cls, '_instance', None):      cls._instance = ThreadPoolExecutor(max_workers=10)    return cls._instanceclass FutureResponseHandler(tornado.web.RequestHandler):  executor = Executor()  @tornado.web.asynchronous  @tornado.gen.coroutine  def get(self, *args, **kwargs):    future = Executor().submit(self.ping, 'www.google.com')    response = yield tornado.gen.with_timeout(datetime.timedelta(10), future,                         quiet_exceptions=tornado.gen.TimeoutError)    if response:      print 'response', response.result()  @tornado.concurrent.run_on_executor  def ping(self, url):    os.system("ping -c 1 {}".format(url))    return 'after'

The thread pool can also be used to suspend the function by using tornado's yield to implement coroutine processing. Results of time-consuming tasks can be obtained without blocking the main thread.

Concurrency Level:   5Time taken for tests:  0.043 secondsComplete requests:   5Failed requests:    0Total transferred:   960 bytesHTML transferred:    0 bytesRequests per second:  116.38 [#/sec] (mean)Time per request:    42.961 [ms] (mean)Time per request:    8.592 [ms] (mean, across all concurrent requests)Transfer rate:     21.82 [Kbytes/sec] received

The qps is 116. The yield coroutine is only about of the non-reponse. It seems that the performance has suffered a lot, mainly because the result returned by this coroutine needs to wait for the task to be completed.

For example, if you hit the fish, the previous method is to spread the network, and then it's done. You don't need to worry about it. Of course, the time is fast. After the second method is to spread the network, you still have to collect the network, and wait for a while. Of course, it is times faster than the synchronous method. After all, the network is faster than a single fishing method.

The method used is more dependent on the business. If you do not need to return a value, you often need to process callback. Too many callbacks are easy to get dizzy. Of course, if you need a lot of callback nesting, the first optimization is the business or product logic. Yield is very elegant, and the writing method can be asynchronously written in synchronous logic. It's great, and of course it will lose some performance.

Asynchronous diversification
The processing of Tornado asynchronous services is almost the same. Now there are many asynchronous processing frameworks and libraries. With the help of redis or celery, you can also make some services in tonrado asynchronous and put them in the background for execution.

In addition, Tornado also provides client asynchronous functions. This feature is mainly used by AsyncHTTPClient. At this time, the application scenario is often within the tornado service, and requests and processing for other IO needs. By the way, in the above example, calling ping is actually a kind of I/O processing in the service. Next, we will explore the use of AsyncHTTPClient, especially the use of AsyncHTTPClient to upload files and forward requests.

Asynchronous Client
Previously, I learned about the common practices of Tornado asynchronous tasks. In our services, asynchronous requests are also required for third-party services. For HTTP Requests, the Python library Requests is the best library. Official website: HTTP for Human. However, using requests directly in tornado would be a nightmare. Requests block the entire service process.

When God closes the door, a window is often opened. Tornado provides an asynchronous HTTP client based on the framework itself (of course there are also synchronous clients) --- AsyncHTTPClient.

Basic usage of AsyncHTTPClient
AsyncHTTPClient is an asynchronous http client provided by tornado. httpclinet. It is easy to use. Like a service process, AsyncHTTPClient can be used either callback or yield. The former does not return results, and the latter returns response.

If a third-party service is requested to be synchronized, the performance will also be killed.

class SyncHandler(tornado.web.RequestHandler):  def get(self, *args, **kwargs):    url = 'https://api.github.com/'    resp = requests.get(url)    print resp.status_code    self.finish('It works')

The AB test is probably as follows:

Document Path:     /syncDocument Length:    5 bytesConcurrency Level:   5Time taken for tests:  10.255 secondsComplete requests:   5Failed requests:    0Total transferred:   985 bytesHTML transferred:    25 bytesRequests per second:  0.49 [#/sec] (mean)Time per request:    10255.051 [ms] (mean)Time per request:    2051.010 [ms] (mean, across all concurrent requests)Transfer rate:     0.09 [Kbytes/sec] received

The performance is quite slow. Change it to AsyncHTTPClient and try again:

class AsyncHandler(tornado.web.RequestHandler):  @tornado.web.asynchronous  def get(self, *args, **kwargs):    url = 'https://api.github.com/'    http_client = tornado.httpclient.AsyncHTTPClient()    http_client.fetch(url, self.on_response)    self.finish('It works')  @tornado.gen.coroutine  def on_response(self, response):    print response.code

Qps improved a lot

Document Path:     /asyncDocument Length:    5 bytesConcurrency Level:   5Time taken for tests:  0.162 secondsComplete requests:   5Failed requests:    0Total transferred:   985 bytesHTML transferred:    25 bytesRequests per second:  30.92 [#/sec] (mean)Time per request:    161.714 [ms] (mean)Time per request:    32.343 [ms] (mean, across all concurrent requests)Transfer rate:     5.95 [Kbytes/sec] received

Similarly, to obtain the response result, you only need the yield function.

class AsyncResponseHandler(tornado.web.RequestHandler):  @tornado.web.asynchronous  @tornado.gen.coroutine  def get(self, *args, **kwargs):    url = 'https://api.github.com/'    http_client = tornado.httpclient.AsyncHTTPClient()    response = yield tornado.gen.Task(http_client.fetch, url)    print response.code    print response.body

AsyncHTTPClient forwarding
Tornado often requires some forwarding services, and AsyncHTTPClient is required. Since it is forwarding, it is impossible to have only the get method, post, put, delete and other methods. In this case, headers, body, and https waring are involved.

The following is a post example. The yield result. Generally, when using yield, handler requires tornado. gen. coroutine.

headers = self.request.headersbody = json.dumps({'name': 'rsj217'})http_client = tornado.httpclient.AsyncHTTPClient()resp = yield tornado.gen.Task(  self.http_client.fetch,   url,  method="POST",   headers=headers,  body=body,   validate_cert=False)

AsyncHTTPClient construct request
If the service is not written in handlers, but elsewhere, when tornado. gen. coroutine cannot be used directly, you can construct a request and Use callback.

body = urllib.urlencode(params)req = tornado.httpclient.HTTPRequest( url=url,  method='POST',  body=body,  validate_cert=False) http_client.fetch(req, self.handler_response)def handler_response(self, response):  print response.code

The usage is also relatively simple. The fetch method in AsyncHTTPClient, the first parameter is actually an HTTPRequest instance object, so for some parameters related to http requests, such as method and body, you can use HTTPRequest to construct a request before throwing it to the fetch method. Generally, if validate_cert is enabled in the forwarding service, it may return 599timeout or the like. This is a warning, but the official team thinks it is reasonable.

AsyncHTTPClient uploads images
The more advanced usage of AsyncHTTPClient is to upload images. For example, a function of the service is to request the image OCR service of a third-party service. The images uploaded by users must be forwarded to third-party services.

@ Router. route ('/api/v2/account/upload') class ApiAccountUploadHandler (helper. baseHandler): @ tornado. gen. coroutine @ helper. token_require def post (self, * args, ** kwargs): upload_type = self. get_argument ('type', None) files_body = self. request. files ['file'] new_file = 'upload/new_pic.jpg 'new_file_name = 'new_pic.jpg' # Write a file with open (new_file, 'w') as w: w. write (file _ ['body']) logging.info ('user {} upload {}'. format (user_id, new_file_name) # asynchronous request to upload images with open (new_file, 'rb') as f: files = [('image', new_file_name, f. read ()] fields = ('api _ key', key), ('api _ secret ', SECRET) content_type, body = encode_multipart_formdata (fields, files) headers = {"Content-Type": content_type, 'content-length': str (len (body)} request = tornado. httpclient. HTTPRequest (config. OCR_HOST, method = "POST", headers = headers, body = body, validate_cert = False) response = yield tornado. httpclient. asyncHTTPClient (). fetch (request) def encode_multipart_formdata (fields, files): "" fields is a sequence of (name, value) elements for regular form fields. files is a sequence of (name, filename, value) elements for data to be uploaded as files. return (content_type, body) ready for httplib. HTTP instance "boundary = '---------- ThIs_Is_tHe_bouNdaRY _ $ 'crlf =' \ r \ n' l = [] for (key, value) in fields: l. append ('--' + boundary) l. append ('content-Disposition: form-data; name = "% s" '% key) l. append ('') l. append (value) for (key, filename, value) in files: filename = filename. encode ("utf8") l. append ('--' + boundary) l. append ('content-Disposition: form-data; name = "% s"; filename = "% s" '% (key, filename) l. append ('content-Type: % s' % get_content_type (filename) l. append ('') l. append (value) l. append ('--' + boundary + '--') l. append ('') body = crlf. join (l) content_type = 'multipart/form-data; boundary = % s' % boundary return content_type, bodydef get_content_type (filename): import mimetypes return mimetypes. guess_type (filename) [0] or 'application/octet-stream'

Compared with the above usage, uploading an image only requires an image encoding. Encodes the binary data of an image in multipart mode. When encoding, you also need to process the passed fields. In contrast, using requests is very simple:

files = {}f = open('/Users/ghost/Desktop/id.jpg')files['image'] = fdata = dict(api_key='KEY', api_secret='SECRET')resp = requests.post(url, data=data, files=files)f.close()print resp.status_Code

Summary
Through the use of AsyncHTTPClient, handler can easily implement requests to third-party services. Combined with the previous usage of tornado Asynchronization. It is nothing more than two keys. Whether to return the result to determine whether to use callback or yield. Of course, if different functions are all yield, yield can always be passed. In this feature, tornado. auth in tornado faces oauth authentication.

This is generally the usage.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.