0x00 HTTP Proxy
HTTP proxy is very useful, there are public agents on the market, but sometimes for work needs, such as analysis of application layer flow, Do data access control, and even do monitoring and so on. Tornado provides some very convenient environments and APIs that enable us to easily implement an HTTP proxy based on the tornado.
0X01 Implementation principle
The HTTP proxy primarily does the forwarding between the client and the Web server. This is a scenario that everyone is familiar with, but only in the case of the HTTP protocol. The case for HTTPS. At this time the agent only as a TCP relay for information transfer, need to be processed separately.
0X02 Tornado implementation
Based on tornado can implement an asynchronous HTTP proxy, the performance is superior, the implementation is simple, the main use of the class is Asynchttpclient,iostream.
Students who have read the tornado source code may not be familiar with these two classes.
Here or simply, Asynchttpclient, as the name implies, is used to make asynchronous httpclient requests. The iostream is a layer of encapsulation of the socket.
The asynchttpclient is used to handle normal HTTP requests. After RequestHandler obtains the client request, the proxy needs to parse the client's request and use this class to request the server, get the response, and write to the client. Finish the call.
For proxy as a TCP relay, the fact that you can actually use the native socket two header to read and write data is just too cumbersome. Tornado provides a iostream class that can be seen as a socket wrapper class that is much simpler to use than a socket. And the socket is 异步非堵塞
.
Talk is cheap, show me the code
, do not say, look at the code, here for some reason, I can only post the key part of the code, I hope that the students read this article can write their own to use, in fact, it is not difficult.
Handling HTTP Requests
@tornado. Web.asynchronous def get(self): # GET request BodyBODY = Self.request.bodyif notBody:body =None Try:# Proxy Send requestRender_request (Self.request.uri, Callback=self.on_response, met Hod=self.request.method, Body=body, Headers=self.request.headers, follow_redirects=False, allow_nonstandard_methods=True)exceptTornado.httpclient.HTTPError asHttperror:ifHasattr (Httperror,' response ') andHttperror.response:self.on_response (Httperror.response)Else: Self.set_status ( -) Self.write (' Internal server error:\n '+ str (httperror)) Self.finish ()
There's nothing to say. received a client request. Go directly to the request server. The asynchronous callback function is On_response, which handles the interaction of the proxy and the client. self.write(response.body)
you know that.
There's a hole in here. It's time to write headers. The response of the headers copy set again will be wrong, resulting in access to failure. My approach here is simply to write the RequestHandler that self._headers
exist in the header.
TCP Relay Implementation
Connect request for 443 port or browser. The agent can only start with the TCP layer. Forwards the entire HTTP message. Using the Connect method in the HTTP protocol, this method can be implemented in RequestHandler.
Be careful here. Tornado default is the Connect method that does not support HTTP, so you have to change the SUPPORTED_METHODS
number of parameters:
Here SUPPORTED_METHODS
you can add a replacement parent class to the RequestHandler:
SUPPORTED_METHODS.append(‘CONNECT‘)
By the way, when this method is called, the agent does not need to relate the details of the HTTP layer request, but forwards the message directly from the TCP layer to the server.
When received, the same is forwarded to the client.
CONNECT www.web-tinker.com:80 HTTP/1.1Hostwww.web-tinker.com:80Proxy-ConnectionKeep-AliveProxy-AuthorizationBasic *Content-Length0
The detailed implementation of the code is as follows:
@tornado. Web.asynchronous def connect(self): " for HTTPS connections. Proxy should be a "TCP relay " def req_close(data): ifConn_stream.closed ():return Else: Conn_stream.write (data) def write_to_server(data):Conn_stream.write (data) def proxy_close(data): ifReq_stream.closed ():return Else: Req_stream.close (data) def write_to_client(data):Req_stream.write (data) def on_connect(): "' create callback for TCP relay 'Req_stream.read_until_close (Req_close, Write_to_server) conn_stream.read_until_close (Proxy_close, Write_to_clie NT) Req_stream.write (B ' http/1.0 Connection established\r\n\r\n ')Print ' starting Conntect to%s '% Self.request.uri# Get the socket for requestReq_stream = Self.request.connection.stream# Locate the host port. Generally for 443Host, Port = (None,443) Netloc = Self.request.uri.split (': ')ifLen (netloc) = =2: host, Port = NetlocelifLen (netloc) = =1: host = netloc[0]# Create iostreams = socket.socket (socket.af_inet, socket. Sock_stream,0) Conn_stream = Tornado.iostream.IOStream (s) conn_stream.connect ((host, Port), On_connect)
I explain these two sentences:
req_stream.read_until_close(req_close, write_to_server)conn_stream.read_until_close(proxy_close, write_to_client)
Just two lines of code, plus 4 callback functions, is the end of the data transfer.
First of all, Req_stream is the socket between the proxy and the client, and the socket between the corresponding Iostream,proxy and server can be obtained through HttpRequest, which is conn_stream.
The Read_until_close method is provided in iostream to read the data until the socket is closed.
The function of the first line is to read the data from the socket between the client and proxy. After it is read, it is written to the socket between proxy and server. Forwarded by proxy.
The second line is to write the server data into the Clientsocket, as in the above. There's nothing to say. The Write function is in four callback functions.
Some wonder why Read_until_close has two callback functions. My understanding is that the first callback is called when it is closed, and the second callback is called when the data is read out.
Write to use the effect is OK:
Implementing an HTTP proxy using tornado