HTTP Proxy principle and implementation (1) (1)

Source: Internet
Author: User

HTTP Proxy principle and implementation (1) (1)

Web Proxy is an entity that exists in the middle of the network and provides various functions. Web agents are everywhere in modern network systems. In my previous post on HTTP, I have repeatedly mentioned the impact of proxy on HTTP requests and responses. Today, I am going to talk about some of the principles of HTTP proxy and how to use Node. js to quickly implement proxy.

HTTP Proxy has two forms:

The first type is the common proxy described in RFC 7230-HTTP/1.1: Message Syntax and Routing (that is, the first part of the revised RFC 2616, HTTP/1.1 protocol. This proxy plays the "man-in-the-middle" role. For clients connected to it, it is the server; for the server to be connected, it is the client. It is responsible for transmitting HTTP packets back and forth between the two ends.

The second is Tunneling TCP based protocols through the Web proxy servers (tunnel transmission through the Web proxy Server based on TCP protocol. It implements communication through the HTTP Body and implements any TCP-based application layer protocol proxy over HTTP. This proxy uses the http connect method to establish a connection. However, CONNECT was not part of RFC 2616-HTTP/1.1 at the beginning until the HTTP/2014 revision was released in 1.1, added the description of CONNECT and tunnel proxy. For details, see RFC 7231-HTTP/1.1: Semantics and Content. In fact, this kind of proxy has long been widely implemented.

The first type of proxy described in this article corresponds to Chapter 6 "proxy" in the HTTP authoritative guide; the second type of proxy corresponds to Chapter 8 "integration points: section 8.5 tunnel in Gateway, tunnel, and relay 」.

Common proxy

The principle of the first Web proxy is particularly simple:

When an HTTP client sends a request message to the proxy, the proxy server must correctly process the request and Connection (for example, correctly handle Connection: keep-alive) and send the request to the server, and forward the received response to the client.

The following figure shows the above behavior in the HTTP authoritative guide:

 

If I access website A through A proxy, for website A, it treats the proxy as A client and cannot detect the existence of the real client completely. This achieves the purpose of hiding the Client IP address. Of course, the proxy can also modify the HTTP request header and tell the server the real client IP through a custom header such as X-Forwarded-IP. However, the server cannot verify whether the custom header is added by the proxy or the client modifies the request header. Therefore, you must be careful when obtaining the IP address from the HTTP header field.

To explicitly specify a Proxy for a browser, you must manually modify the browser or operating system settings, or specify the automatic setting of the PAC (Proxy Auto-Configuration, automatic Configuration Proxy) file, some browsers also support WPAD (Web Proxy Autodiscovery Protocol, Web Proxy Automatic Discovery Protocol ). Explicitly specifying the browser proxy is generally called a forward proxy. After the browser enables the forward proxy, it will modify the HTTP request message to avoid some problems of the old proxy server.

Another case is that when accessing website A, the proxy actually accesses the website. After receiving the request message, the proxy initiates A request to the server that actually provides the service and forwards the response to the browser. This is generally called a reverse proxy, which can be used to hide the Server IP address and port. Generally, after reverse proxy is used, you need to modify DNS to resolve the domain name to the Proxy Server IP address. In this case, the browser cannot detect the existence of the Real Server. Of course, you do not need to modify the configuration. Reverse Proxy is the most common deployment method for Web systems. For example, this blog uses the proxy_pass function of Nginx to forward browser requests to the Node. js service.

After learning about the basic principle of the first proxy, we can use Node. js to implement it. The code that only contains the core logic is as follows:

 
 
  1. var http = require('http'); 
  2. var net = require('net'); 
  3. var url = require('url'); 
  4.  
  5. function request(cReq, cRes) { 
  6.     var u = url.parse(cReq.url); 
  7.  
  8.     var options = { 
  9.         hostname : u.hostname,  
  10.         port     : u.port || 80, 
  11.         path     : u.path,        
  12.         method     : cReq.method, 
  13.         headers     : cReq.headers 
  14.     }; 
  15.  
  16.     var pReq = http.request(options, function(pRes) { 
  17.         cRes.writeHead(pRes.statusCode, pRes.headers); 
  18.         pRes.pipe(cRes); 
  19.     }).on('error', function(e) { 
  20.         cRes.end(); 
  21.     }); 
  22.  
  23.     cReq.pipe(pReq); 
  24.  
  25. http.createServer().on('request', request).listen(8888, '0.0.0.0'); 

After the above Code is run, the HTTP proxy service is enabled on the local port 8888. This service parses the request URL and other necessary parameters from the request message and creates a request to the server, forward the request received by the proxy to the new request, and then return the server response to the browser. Modify the HTTP proxy of the browser to 127.0.0.1: 8888 and then visit the HTTP website. The proxy can work normally.

However, after using our proxy service, the HTTPS website is completely inaccessible. Why? The answer is simple. This proxy provides the HTTP service and cannot carry the HTTPS service. So can I change the proxy to HTTPS? Obviously, no, because the nature of this proxy is man-in-the-middle, and the certificate authentication mechanism of the HTTPS website is the Ke Xing hijacked by man-in-the-middle. In a common HTTPS service, the server does not verify the client certificate. The man-in-the-middle can successfully complete the TLS handshake between the client and the server, but the man-in-the-middle does not have the certificate private key, in any case, the server cannot forge a TLS connection with the client. Of course, if you have the private key of the certificate, the HTTPS website corresponding to the proxy certificate will certainly be okay.

The HTTP packet capture artifact Fiddler also enables the HTTP Proxy Service locally to display and modify HTTP packets by allowing browser traffic to go through this proxy. If you want Fiddler to decrypt the content of the HTTPS package, you must first import its own root certificate to the Trusted Root Certificate list of the system. Once this step is completed, the browser will trust the subsequent "forged certificate" of Fiddler, thus establishing a TLS connection between the browser and Fiddler, Fiddler and the server. For the Fiddler node, the TLS traffic at both ends can be decrypted.

If we do not import the root certificate, can the HTTP proxy of Fiddler still proxy HTTPS traffic? Practice has proved that if you do not import the root certificate, Fiddler cannot decrypt HTTPS traffic and the HTTPS website can still be accessed normally. How is this done? Are these HTTPS traffic safe? These problems will be revealed in the next section.

Tunnel proxy

The principle of the second Web Proxy is also very simple:

The HTTP client uses the CONNECT method to request the tunneling proxy to create a TCP connection to any target server and port, and perform blind forwarding on the subsequent data between the client and the server.

The following picture also comes from the HTTP authoritative guide, which intuitively shows the above behavior:

 

If I access website A through A proxy, the browser first creates A TCP connection to website A through the CONNECT request. Once the TCP connection is established, the proxy does not have the brains to forward subsequent traffic. Therefore, this kind of proxy is theoretically applicable to any TCP-based application layer protocol. The TLS protocol used by HTTPS websites is also acceptable. This is why such a proxy is called a tunnel. For HTTPS, the client directly uses the TLS handshake to negotiate the key with the server through the proxy, so it is still safe. The packet capture information in shows this scenario:

 

As you can see, the browser initiates a CONNECT request after a TCP handshake with the proxy. the starting line of the packet is as follows:

 
 
  1. CONNECT imququ.com:443 HTTP/1.1 

For a CONNECT request, it is only used for the proxy to create a TCP connection. Therefore, you only need to provide the server domain name and port, and do not need a specific resource path. After receiving such a request, the proxy needs to establish a TCP connection with the server and respond to the browser with such an HTTP message:

 
 
  1. HTTP/1.1 200 Connection Established 

When the browser receives the response message, it can be deemed that the TCP connection to the server has been established, and then it can directly write protocol data to the TCP connection. The Follow TCP Steam function of Wireshark clearly shows the data transmission between the browser and the proxy:

 

As you can see, the HTTP round-trip generated by the browser's TCP connection to the server is completely in plain text. This is why the CONNECT request only needs to provide the domain name and port: if the complete URL, Cookie, and other information are sent, the security of HTTPS is reduced. For HTTPS traffic carried by the HTTP proxy, Application Data must be transmitted through the Application Data Protocol after the TLS handshake is successful. The intermediate node cannot learn the master-secret used for traffic encryption and cannot decrypt the Data. The domain name and port exposed by CONNECT can be obtained by middlemen for common HTTPS requests, the requested domain Name can be obtained through DNS Query or Server Name Indication in TLS Client Hello), so this method does not increase insecure.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.