Implementing an HTTP proxy using tornado

Source: Internet
Author: User

0x00 HTTP Proxy

HTTP proxy is very useful, there are public agents on the market, but sometimes for work needs, such as analysis of application layer flow, Do data access control, and even do monitoring and so on. Tornado provides some very convenient environments and APIs that enable us to easily implement an HTTP proxy based on the tornado.

0X01 Implementation principle

The HTTP proxy primarily does the forwarding between the client and the Web server. This is a scenario that everyone is familiar with, but only in the case of the HTTP protocol. The case for HTTPS. At this time the agent only as a TCP relay for information transfer, need to be processed separately.


0X02 Tornado implementation

Based on tornado can implement an asynchronous HTTP proxy, the performance is superior, the implementation is simple, the main use of the class is Asynchttpclient,iostream.

Students who have read the tornado source code may not be familiar with these two classes.

Here or simply, Asynchttpclient, as the name implies, is used to make asynchronous httpclient requests. The iostream is a layer of encapsulation of the socket.


The asynchttpclient is used to handle normal HTTP requests. After RequestHandler obtains the client request, the proxy needs to parse the client's request and use this class to request the server, get the response, and write to the client. Finish the call.
For proxy as a TCP relay, the fact that you can actually use the native socket two header to read and write data is just too cumbersome. Tornado provides a iostream class that can be seen as a socket wrapper class that is much simpler to use than a socket. And the socket is 异步非堵塞 .
Talk is cheap, show me the code, do not say, look at the code, here for some reason, I can only post the key part of the code, I hope that the students read this article can write their own to use, in fact, it is not difficult.

Handling HTTP Requests
    @tornado. Web.asynchronous     def get(self):        # GET request BodyBODY = Self.request.bodyif  notBody:body =None        Try:# Proxy Send requestRender_request (Self.request.uri, Callback=self.on_response, met                    Hod=self.request.method, Body=body, Headers=self.request.headers, follow_redirects=False, allow_nonstandard_methods=True)exceptTornado.httpclient.HTTPError asHttperror:ifHasattr (Httperror,' response ') andHttperror.response:self.on_response (Httperror.response)Else: Self.set_status ( -) Self.write (' Internal server error:\n '+ str (httperror)) Self.finish ()

There's nothing to say. received a client request. Go directly to the request server. The asynchronous callback function is On_response, which handles the interaction of the proxy and the client. self.write(response.body)you know that.
There's a hole in here. It's time to write headers. The response of the headers copy set again will be wrong, resulting in access to failure. My approach here is simply to write the RequestHandler that self._headers exist in the header.

TCP Relay Implementation

Connect request for 443 port or browser. The agent can only start with the TCP layer. Forwards the entire HTTP message. Using the Connect method in the HTTP protocol, this method can be implemented in RequestHandler.


Be careful here. Tornado default is the Connect method that does not support HTTP, so you have to change the SUPPORTED_METHODS number of parameters:

Here SUPPORTED_METHODS you can add a replacement parent class to the RequestHandler:

SUPPORTED_METHODS.append(‘CONNECT‘)

By the way, when this method is called, the agent does not need to relate the details of the HTTP layer request, but forwards the message directly from the TCP layer to the server.

When received, the same is forwarded to the client.

CONNECT www.web-tinker.com:80 HTTP/1.1Hostwww.web-tinker.com:80Proxy-ConnectionKeep-AliveProxy-AuthorizationBasic *Content-Length0

The detailed implementation of the code is as follows:

    @tornado. Web.asynchronous     def connect(self):        " for HTTPS connections. Proxy should be a "TCP relay "         def req_close(data):            ifConn_stream.closed ():return            Else: Conn_stream.write (data) def write_to_server(data):Conn_stream.write (data) def proxy_close(data):            ifReq_stream.closed ():return            Else: Req_stream.close (data) def write_to_client(data):Req_stream.write (data) def on_connect():            "' create callback for TCP relay 'Req_stream.read_until_close (Req_close, Write_to_server) conn_stream.read_until_close (Proxy_close, Write_to_clie NT) Req_stream.write (B ' http/1.0 Connection established\r\n\r\n ')Print ' starting Conntect to%s '% Self.request.uri# Get the socket for requestReq_stream = Self.request.connection.stream# Locate the host port. Generally for 443Host, Port = (None,443) Netloc = Self.request.uri.split (': ')ifLen (netloc) = =2: host, Port = NetlocelifLen (netloc) = =1: host = netloc[0]# Create iostreams = socket.socket (socket.af_inet, socket. Sock_stream,0) Conn_stream = Tornado.iostream.IOStream (s) conn_stream.connect ((host, Port), On_connect)

I explain these two sentences:

req_stream.read_until_close(req_close, write_to_server)conn_stream.read_until_close(proxy_close, write_to_client)

Just two lines of code, plus 4 callback functions, is the end of the data transfer.


First of all, Req_stream is the socket between the proxy and the client, and the socket between the corresponding Iostream,proxy and server can be obtained through HttpRequest, which is conn_stream.
The Read_until_close method is provided in iostream to read the data until the socket is closed.
The function of the first line is to read the data from the socket between the client and proxy. After it is read, it is written to the socket between proxy and server. Forwarded by proxy.
The second line is to write the server data into the Clientsocket, as in the above. There's nothing to say. The Write function is in four callback functions.
Some wonder why Read_until_close has two callback functions. My understanding is that the first callback is called when it is closed, and the second callback is called when the data is read out.
Write to use the effect is OK:

Implementing an HTTP proxy using tornado

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.