Basic distinguish between web.py / flup and tornado web process handling model (TBC)

來源:互聯網
上載者:User

Tornado is known for its capability of handling concurrent connections with help of OS event triggering mechanisms like epoll and kqueue.

Web.py is a web framework for Python. It relies on other server packages to serve as a complete web server software.

When trying to setup, Tornado could be put to work on its own, while common setup is to put behind an nginx server (via proxy_pass) for handling static resources and other matters while leave Tornado to deal with dynamic requests after reverse proxy.

In contrast, web.py usually requires flup to run as a FastCGI service and then is connected to nginx via fastcgi_pass derivatives. 

Appears to a new user they are similar to some extend. I wrote a few very simple scripts [1] / [2] and tested them in the same server running behind the same nginx configuration. Each running two processes (web.py via spawn-fcgi, tornado via tornado.process.fork_processes)
and returning simple string within GET handler. In average nginx + tornado gives 75 - 125ms serve time per requests whilenginx + web.py at `3sec per request, both at 50 concurrent clients (ab -c50). With less concurrent clients
the time difference may be up to 10 times even.

Then I added minor delay in the GET handler for both scripts (with time.sleep (0.1)) to simulate some system processing time. I was dealing with relatively time consuming filesystem requests with my web service before starting looking at both solutions and
therefore this simulation is quite similar to the kind of prolem I am looking at. Surprisingly, thenginx + tornado script slowed to 5sec+ per request and is much, much slower than web.py.

I understood how Tornado works, based on my understanding of epoll / IO multiplexing theories. However since web.py is kind of a mistery I had to look into the source code. Then I saw that the web.py snippet called into flup for creating an "runwsgi" function,
which in tern creates an ThreadedServer within flup. ThreadedServer had an addJob method which is so familiar looking, and within minute I could see that, for each client socket returned from the select call (ThreadedServer.run), a new "job" hence a new thread
in pool is created. Legendary one thread per client model. Even without looking at how web.py (and my code) was called back from flup, I know that:

  1. for those blocking calls (either blocking I/O operations or other matters like the time.sleep call here) are handled by threads / OS scheduler
  2. with large amount of simple, non-blocking / once-off requests, they must be slower than epoll approach.

However when blocking operations appear (such as my sleep call, filesystems, DB calls, etc), epoll will NOT help. OS will wait for such operations to finish before returning to the script. Since there are only two Tornado processes running, there can only be
no more than two instances of clients being served at the same time, even both are sleeping. With flup, threads are created and scheduled by the OS therefore they could be scheduled to run as long as CPU isn't completely hogged.

If we look at the packages available to Tornado, apart from the server package, there are http client packages, async Mongodb packages and some authentication packages built around the http client package. We could clearly see that, to better utilized Tornado,
application need to better use the epoll / IOLoop as the core of application. Tornado framework handles all network waiting time (using epoll) and carefully crafted apps would then response to all events in a timely manner. It's very different from the traditional
CGI style of request handling, but it's definitely towards the right direction.

Issues left over:

1.

Tornado didn't have async MySQL  package available and FriendFeed (original author) mentioned [3] that 
  We experimented with different async DB approaches, but settled on 
  synchronous at FriendFeed because generally if our DB queries were 
  backlogging our requests, our backends couldn't scale to the load 
  anyway. Things that were slow enough were abstracted to separate 
  backend services which we fetched asynchronously via the async HTTP 
  module. 
Question: how to better arrange resources to run other services to handling blocking services? Upon what principles design decitions should be made?

2.

When testing response speed of Tornado raw (without  nginx) using ab shipped with OS X ML request failed from time to time. Saw mentioning that these are caused by bugs in the version of ab shipped with OS X. Should re-test with palb (python implemetation
of ab) or other implementations.

Bug: http://simon.heimlicher.com/articles/2012/07/08/fix-apache-bench-ab-on-os-x-lion

Test with palb, with or without set_header ('Connection', 'Keep-Alive') such conn reset errors is not presenting.

3.

Nginx speaks HTTP/1.0 when used as (reverse) proxy server, which closes connection upon each request. How does this affect the performance of Tornado server? I suppose epoll is designed for comet usages (large number of stale connections)?

Answer: Nginx actually support HTTP/1.1 and Keepalive for upstream proxy settings. See http://nginx.org/en/docs/http/ngx_http_upstream_module.html#keepalive 

[4] mentioned using HAProxy instead of nginx. Might worth looking at.

4. 

With the Tornado code specified, even there are two Python processes running after starting the server, there is only one accepting requests. Possible solutions: may still need to manage running on two ports and load balance with nginx but it's not ideal.
fork_processes model should have made its way around this problem.

Solution:

fork_process (0) / start (0) creates worker processes based on CPU number in system. Observed two python processes means only one worker process is created - therefore only one process is running request handlers. Testing VM was a single core system. Specifying
start(2) results in 3 python processes and two are sharing the load.

Links:

[1] Web.py test script: https://gist.github.com/4371628

[2] Tornado test script: https://gist.github.com/4363542

[3] http://news.ycombinator.com/item?id=3025475

[4] "Need help on putting tornado apps on production", great info packed - https://groups.google.com/forum/?fromgroups=#!topic/python-tornado/62TLw_gmp94

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.