PythonWeb server Tornado usage Summary

Source: Internet
Author: User
Recently, I am working on backend development for a website. Because I was the only one in the early stage, I was very free to choose technology. On the web server, I chose Tornado. Although I have also read its source code and made some small demos, after all, this is the first time I used it at work. It is inevitable that I want to talk about its security first, in this regard, I can feel the painstaking efforts of it. This can be divided into two main points:

1. Cross-site request forgery (CSRF or XSRF)

CSRF simply means that attackers forge real users to send requests.

For example, assume that a bank website has such a URL:
Http://bank.example.com/withdraw? Amount = 1000000 & for = Eve
When a user of this bank website accesses this URL, the user will be given 1 million yuan to Eve. Of course, users will not easily click this URL, but attackers can embed a forged image on other websites and set the image address to this URL:

When a user accesses a malicious website, the browser will initiate a GET request to the URL. Therefore, 1 million will be transferred without the user's knowledge.

It is easy to prevent the above attacks. You cannot use the GET request to perform change operations (such as transfer. However, other types of requests are still insecure. If attackers construct such a form:

The Code is as follows:

If you click the "Forward" button, the money will be transferred...

To prevent this situation, you must add a field that cannot be forged by the attacker during non-GET requests and verify whether the field has been modified during request processing.
The Tornado processing method is very simple. A randomly generated _ xsrf field is added to the request, and this field is also added to the cookie. When receiving the request, compare the values of these two fields.
Because non-Website webpages cannot obtain or modify cookies, this ensures that _ xsrf cannot be forged by a third-party website (except http sniffing ).
Of course, users can obtain and modify cookies at will, but this is no longer within the CSRF scope: users can forge their own things, of course, they are responsible for it.

To use this function, you must add xsrf_cookies = True when generating the tornado. web. Application object. This will generate a cookie field named _ xsrf for the user.
In addition, you also need to add xsrf_form_html () to the form of a non-GET request. If you do not need the Tornado template, you can use self. xsrf_form_html () to generate it within tornado. web. RequestHandler.

For AJAX requests, there is basically no need to worry about cross-site requests. Therefore, the versions earlier than Tornado 1.1.1 do not validate requests With X-Requested-With: XMLHTTPRequest.
Later, Google engineers pointed out that malicious browser plug-ins can forge cross-origin AJAX requests, so they should also perform verification. I'm not sure about this, because the browser plug-in has a very high permission. It can be used to forge a cookie or directly submit a form.
But the solution is still to say that you only need to get the _ xsrf field from the cookie, add this parameter to the AJAX request, or put it in the X-Xsrftoken or X-Csrftoken request header. If it is too troublesome, you can use jQuery's $. ajaxSetup () for processing:

The Code is as follows:


$. AjaxSetup ({
BeforeSend: function (jqXHR, settings ){
Type = settings. type
If (type! = 'Get' & type! = 'Head' & type! = 'Options '){
Var pattern =/(. + ;*)? _ Xsrf * = * ([^; "] + )/;
Var xsrf = pattern.exe c (document. cookie );
If (xsrf ){
JqXHR. setRequestHeader ('x-xsrftoken', xsrf [2]);
}
}
}});

In addition, let's talk about Cross-site scripting (XSS ). Contrary to CSRF, XSS injects the script code that attackers want to execute on the website by exploiting the vulnerability of the attacked website, allowing users who browse the website to execute the code.
However, as long as users are not allowed to enter HTML at Will (for example, escape <and>), the attribute of the HTML element is verified (for example, the quotation marks in the attribute must be escaped, src, event processing, and other attributes cannot be filled with JavaScript code at will), and the expressions in CSS (including style attributes) can be checked to avoid this.

2. Prevent cookie forgery.

Both the CSRF and XSS mentioned above are operated by attackers without the user's knowledge. The cookie forgery means that the attacker takes the initiative to forge other users for the operation.
For example, assume that the website login verification is to check the user name in the cookie. if it meets the requirements, the user is deemed to have logged on. Therefore, an attacker can impersonate an administrator to set username = admin and other values in the cookie.

To prevent cookie forgery, we must first mention two parameters when setting the cookie: secure and httponly. These two parameters are not included in the parameter list of tornado. web. RequestHandler. set_cookie (), but are passed as keyword parameters and defined in Cookie. Morsel. _ reserved.
The former means that the cookie can only be transmitted through secure connections (that is, HTTPS), which prevents the sniffer from intercepting the cookie; the latter requires that it can only be accessed under the HTTP protocol (that is, the document cannot be obtained through JavaScript. this field in the cookie, and it will not be sent to the server through the HTTP protocol after it is set), which makes it impossible for attackers to simply forge the cookie through the JavaScript script.

However, for malicious attackers, these two parameters cannot prevent cookie forgery. Therefore, you need to sign the cookie. Once the cookie is modified, the server can determine it.
Tornado provides the set_secure_cookie () method to sign the cookie. A string of keys (cookie_secret parameter used to generate the tornado. web. Application Object) must be provided for signature. The key can be generated using the following code:
Base64.b64encode (uuid. uuid4 (). bytes + uuid. uuid4 (). bytes)
This parameter can be generated randomly, but it is better to share a constant if multiple Tornado processes are used for service at the same time or sometimes restarted. Do not disclose this parameter.

This signature uses the HMAC algorithm and the hash algorithm uses the SHA1 algorithm. Simply put, the cookie name, value, and timestamp hash are used as the signature, and "value | timestamp | signature" is used as the new value. In this way, the server only needs to re-encrypt the key to determine whether the signature has changed.
It is worth mentioning that such a function is also found when reading the source code:
Def _ time_independent_equals (a, B ):
If len ()! = Len (B ):
Return False
Result = 0
If type (a [0]) is int: # python3 byte strings
For x, y in zip (a, B ):
Result | = x ^ y
Else: # python2
For x, y in zip (a, B ):
Result | = ord (x) ^ ord (y)
Return result = 0
After reading for half a day, I did not find any advantages compared with normal strings. I did not know until I read the answer on StackOverflow: to prevent attackers from judging the correct number of digits by testing and comparing the time, this function makes the comparison time constant, which eliminates this situation. (I have all kinds of admiration for this answer. Security experts are not that superficial ...)

3. inherit tornado. web. RequestHandler.

In the execution process, tornado. web. Application searches for a matching RequestHandler Class Based on the URL and initializes it. Its _ init _ () method calls the initialize () method, so you only need to overwrite the latter and do not need to call the initialize () of the parent class ().
Then, find the handler's get/post () and other methods based on different HTTP methods, and run prepare () before execution (). These methods do not take the initiative to call the parent class, so you can call them if necessary.
Finally, the handler's finish () method will be called. It is best not to overwrite this method. It will call the on_finish () method, which can be overwritten and used to handle the aftermath (for example, closing the database connection ), however, you cannot send data to the browser anymore (because the HTTP response has been sent, the connection may have been closed ).

By the way, how to handle the error page.
To put it simply, when the _ execute () method of RequestHandler is executed (prepare (), get (), finish (), and so on internally, any uncaptured error will be captured by its write_error () method, So override this method:

The Code is as follows:

Class RequestHandler (tornado. web. RequestHandler ):
Def write_error (self, status_code, ** kwargs ):
If status_code = 404:
Self.render('404.html ')
Elif status_code = 500:
Self.render('500.html ')
Else:
Super (RequestHandler, self). write_error (status_code, ** kwargs)


For historical reasons, you can overwrite the get_error_html () method, but it is not recommended.
In addition, you may have encountered an error before the _ execute () method.
For example, the initialize () method throws an uncaptured exception, which will be caught by IOStream and then closes the connection directly. No error pages can be output to the user.
For example, if no handler can process the request, tornado. web. ErrorHandler will be used to handle the 404 error. In this case, you can replace this class to implement custom error pages:

The Code is as follows:

Class PageNotFoundHandler (RequestHandler ):
Def get (self ):
Raise tornado. web. HTTPError (404)

Tornado. web. ErrorHandler = PageNotFoundHandler


Another method is to add a handler that can capture any URL at the end of the handlers parameter of the Application:

The Code is as follows:

Application = tornado. web. Application ([
#...
('. *', PageNotFoundHandler)
])


4. Next let's talk about how to process logon.

Tornado provides @ tornado. web. authenticated, which can be added before handler's get () and other methods.
It depends on three pieces of code:
You need to define the get_current_user () method of handler, for example:

The Code is as follows:

Def get_current_user (self ):
Return self. get_secure_cookie ('user _ id', 0)


When its return value is false, it will jump to the logon page.
Set the login_url parameter when creating an application:

The Code is as follows:

Application = tornado. web. Application (
[
#...
],
Login_url = '/login'
)


Define the get_login_url () method of handler.
If you cannot use the default login_url parameter (for example, normal users and administrators need different logon addresses), you can override the get_login_url () method:

The Code is as follows:

Class AdminHandler (RequestHandler ):
Def get_login_url (self ):
Return '/admin/login'


By the way, when you jump to the logon page, a next parameter is included, pointing to the URL accessed before logon. To achieve a better user experience, you must log on to this URL:

The Code is as follows:

Class LoginHandler (RequestHandler ):
Def get (self ):
If self. get_current_user ():
Self. redirect ('/')
Return
Self.render('login.html ')

Def post (self ):
If self. get_current_user ():
Raise tornado. web. HTTPError (403)
# Check username and password
If success:
Self. redirect (self. get_argument ('Next ','/'))


In addition, I use AJAX technology in many places, and the front end is too lazy to handle the 403 error, so I can only modify authenticated:

The Code is as follows:

Def authenticated (method ):
"Decorate methods with this to require that the user be logged in ."""
@ Functools. wraps (method)
Def wrapper (self, * args, ** kwargs ):
If not self. current_user:
If self. request. headers. get ('x-Requested-with') = 'xmlhttprequest ': # This header is attached to libraries such as jQuery.
Self. set_header ('content-type', 'application/json; charset = UTF-8 ')
Self. write (json. dumps ({'success ': False, 'msg': U' your session has expired. Please log on again! '}))
Return
If self. request. method in ("GET", "HEAD "):
Url = self. get_login_url ()
If "? "Not in url:
If urlparse. urlsplit (url). scheme:
# If login url is absolute, make next absolute too
Next_url = self. request. full_url ()
Else:
Next_url = self. request. uri
Url + = "? "+ Urllib. urlencode (dict (next = next_url ))
Self. redirect (url)
Return
Raise tornado. web. HTTPError (403)
Return method (self, * args, ** kwargs)
Return wrapper

5. Then let's talk about getting the user's IP address.

Simply put, self. request. remote_ip can be used in the handler method.
However, if the reverse proxy is used, the IP address of the proxy is obtained. At this time, you need to add the xheaders settings when creating the HTTPServer:

The Code is as follows:

If _ name _ = '_ main __':
From tornado. httpserver import HTTPServer
From tornado. netutil import bind_sockets

Sockets = bind_sockets (80)
Server = HTTPServer (application, xheaders = True)
Server. add_sockets (sockets)
Tornado. ioloop. IOLoop. instance (). start ()


In addition, I only need to handle IPv4 addresses, but the IPv6 address: 1 will be obtained during the local test, so I need to set it as follows:

The Code is as follows:

If settings. IPV4_ONLY:
Import socket
Sockets = bind_sockets (80, family = socket. AF_INET)
Else:
Sockets = bind_sockets (80)

6. Finally, I will discuss how to improve the performance in the production environment.

Tornado can create multiple sub-processes before the HTTPServer calls add_sockets () and use the advantages of multiple CPUs to process concurrent requests.

The Code is as follows:

The Code is as follows:

If _ name _ = '_ main __':
If settings. IPV4_ONLY:
Import socket
Sockets = bind_sockets (80, family = socket. AF_INET)
Else:
Sockets = bind_sockets (80)
If not settings. DEBUG_MODE:
Import tornado. process
Tornado. process. fork_processes (0) #0 indicates creating a number of sub-processes based on the number of CPUs
Server = HTTPServer (application, xheaders = True)
Server. add_sockets (sockets)
Tornado. ioloop. IOLoop. instance (). start ()


Note that the autoreload function cannot be enabled in this mode (the debug parameter cannot be true when the application is created ).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.