How does Mitmproxy work?

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Https://corte.si/posts/code/mitmproxy/howitworks/index.html

http://www.oschina.net/translate/how-mitmproxy-works?lang=chs&page=2#

I started working on Mitmproxy because I was bored with the interception tools I used. I have a long list of mild complaints-they don't have enough flexibility, they don't have enough programmability, most of them are written in Java (a language I don't like), and so on. The most serious problem, however, is opacity. The best tools are all closed-source and commercial. SSL interception is a complex and granular process, and starting at a certain point, it is impossible to understand exactly what your agent is doing and can't continue running.

The following is now part of the official Mitmproxy document. It describes the interception process of mitmproxy in detail and is more or less the profile document I wanted to have when I first started the project. I started with an example, starting with the simplest non-encrypted explicit proxy, until the most complex interactive transparent proxy containing the SNI SSL encrypted traffic.
Explicit HTTP

Configuring clients to use Mitmproxy as an explicit proxy is the simplest and most reliable way to intercept traffic. The proxy protocol is codified in the HTTP RFC, so the behavior of the client and server is well defined and usually reliable. A client interacts with Mitmproxy as simply as possible by connecting directly to a proxy server, making a request that looks like this:

GET http://example.com/index.html http/1.1

This is an extended form of a GET proxy request-a traditional HTTP GET request (including a schema and host specification), which includes all the information that Mitmproxy needs to convey upstream of the request.

1. The client connects to the proxy server to make the request.
2. Mitmproxy connects to the upstream server and simply forwards the request.

Clear Text (display) HTTPS

The connection process for an explicit proxy HTTPS is completely different. Clients typically connect to a proxy server in this way:

CONNECT example.com:443 http/1.1

The traditional proxy server does not monitor or manipulate SSL encrypted traffic, and that one connection request simply requests that the proxy server assume a pipeline between the client and the server. The proxy server here is like a catalyst-it blindly forwards data in two directions but doesn't care about the content of the data. Encrypted SSL data is transmitted over this channel, and subsequent requests and responses are completely opaque to the proxy server.

Mitmproxy in a mitm (middleman) This is where mitmproxy important tricks work. MITM, as its name represents, is the middleman, the process we use to intercept and block these theoretically opaque streams of data. The basic idea is to impersonate a server to the client and pretend to be a client for the server, so that we are in the middle of decoding the data flow from both sides. The trick is to design a certification authorization system to accurately block attacks by allowing trusted third parties to encrypt the SSL certificate of the server to verify that they are legitimate. If the signature does not match or comes from an untrusted party, the security agent simply discards the connection and rejects the next step. While there are many drawbacks to the CA system today, trying to parse a MITM SSL connection is often fatal. The answer to this riddle-guessing analysis is that we ourselves become a trusted center of certification authority. The mitmproxy contains a complete CA implementation that can generate interception authentication during operation. To enable the client to trust such certifications, we manually register the Mitmproxy as a trusted authentication center for the devices that are used.
Difficulty One: What is the remote host name. to keep this idea moving forward, we need to know the domain name used in the interception certificate, and the client will verify that the certificate is not for the domain he is connecting to, and if not, terminate it immediately. At first glance, it seems in this example that the connect request above gives us everything we need, and two values are "example.com". But what if the client initiates the connection as follows:

Using an IP address is very reasonable, because it gives us enough information to initialize the transport pipeline, even if it does not give the remote host name. Mitmproxy used a sophisticated mechanism that eventually smoothed out the flaw-the upstream certificate listens. When we see the connect request, we immediately pause the client for the session and then initiate an identical connection to the server. We completed the SSL handshake with the server and then looked at the certificate it used. Now we use the common name in the upstream SSL certificate to produce a certificate that is cloned to the client. Look, we can display the correct hostname to the client, even if the client is not specified.
Difficulty Two: User alternate nameInto the next difficulty. In fact, sometimes the common name in the certificate is not the host name that the client is connecting to. This is because the SSL certificate has an optional user alternate name domain that allows you to specify any number of alternate domain names. Even if the domain names do not match the common name of the certificate, if the desired domain name matches any one of those domain names, the client will continue to work on the next step. The answer is simple: when extracting the common name (CN) from the upstream certificate, we also extract the user alternate name (SAN) and add them to the generated copy of the certificate.
difficulty Three: server name indication

One of the big limitations of common SSL is that each certificate needs its own IP address. This means that you cannot do multiple domains with separate certificates that share the same IP address as a virtual host. This is a problem in this fast-shrinking world of IPv4 address pools, but we present a solution to the server name indication extension for the SSL and TLS protocol, which allows the client to specify the remote server name at the beginning of the SSL handshake, and then let the server select the correct certificate to complete the process.

Server name Indication (SNI) destroys the process of the upstream certificate interception, because when we connect without using the server name Indication (SNI), we get the default certificate that provides the service, which may not have anything to do with the client's expected certificate. The workaround is to continue the tricky complication of the client connection process. After the client connects, we allow the SSL handshake to continue until the server name Indication (SNI) is sent to us. Now we pause this session and then use the correct server name indication (SNI) to initiate the upstream connection, and then the server gives us a correct upstream certificate, from which we can extract the desired common name (CN) and server alternate name (SAN).

There's another way here. Due to the limitations of the SSL library used by Mitmproxy, we do not detect that the connection sending server name indication (SNI) request is too late for an upstream authentication listener. Therefore, in practice we use a common SSL connection to listen for an upstream non-server name indication (SNI) authentication, and then release the connection if the client sends a server name indication (SNI) notification. If you use a packet listener to see your data flow, you will be able to see two connections to the server when sending a server name indication (SNI) request. One of them shuts down immediately after the SSL handshake. Fortunately, this is almost never a problem in practice.
Summary

Let's summarize the previous paragraphs as an HTTPS stream for the full explicit proxy.

1. The client initiates a connection to mitmproxy and submits an HTTP connect request.
2.mitmproxy with 200 connection has established a response, as if a Connect communication pipeline has been established.
3. The client is confident that it is in session with the remote server and then initiates the SSL connection. The SSL connection uses the server name indication (SNI) to indicate the host name it is connecting to.
4.mitmproxy connect to the server, and then establish an SSL connection using the hostname indicated by the server name specified by the client.
5. The server responds with a matching SSL certificate that contains the common name (CN) and the server alternate name (SAN) required to generate the interception certificate.
6.mitmproxy generates the blocking certificate and then resumes the client SSL handshake that is paused in step 3rd.
7. The client sends the request through an already established SSL connection.
8.mitmproxy passes this request to the server through the SSL connection established in step 4th.
Transparent HTTP proxy
When using a transparent proxy, you redirect Http/https to a proxy at the network layer, but the client does not require any configuration. Making transparent proxies the best choice without changing the behavior of the client-the Android app that can't see the agent is a common example.

In order to obtain a transparent proxy, we must introduce two additional components. The first is the redirection mechanism, which transparently re-routes the TCP connections of the servers on the Internet to the proxy servers that are listening. This typically implements the-linux iptables or OSX pf in the form of a firewall on the same host as the proxy server. Once the client initializes the connection, it will initiate a common HTTP request that might look like this:

Get/index.html http/1.1

Note that the request here is different from the explicit proxy request because it omits the protocol and host name. So how do we know which upstream host is forwarding the request forward? The routing mechanism that performs the redirect tracks our original destination. Each routing mechanism displays this data in a different way, so here's the second component required for transparent proxy work: the host module. It knows how to get the original destination address from the route. In Mitmproxy, a built-in module set is used to define how and for each platform redirection mechanism session. Once we have this information, the process will be very clear.

1. The client initiates a connection to the server.
2. The router will redirect this connection to the mitmproxy,mitmproxy normally listening on the local port on the same host. Mitmproxy then queries the routing mechanism to determine the original destination address.
3. Now, we just read the client's request ...
4 .... Then forwards the request to the upstream host

Transparent HTTPS proxy
The first step is to determine whether we should treat incoming connections as HTTPS. The mechanism to complete this is very simple-we use the routing mechanism to find out what the original destination port is. By default, we treat all traffic to ports 443 and 8443 as SSL.
From here, the process is the same as the transparent HTTP proxy and the explicit HTTPS proxy that we have already clarified. We use the routing mechanism to determine the address of the upstream server, and then determine the common name (CN) and the server alternate name (SAN), as well as the server name Indication (SNI), as with an explicit HTTPS connection.

1. The client initiates a connection to the server.
2. A router redirection connection to mitmproxy,mitmproxy typically listens on the local port of the same host. Mitmproxy then queries the routing mechanism to determine the original destination address.
3. The client is confident that it is in session with the remote server and then initiates the SSL connection. The SSL connection uses the server name indication (SNI) to indicate the host name it is connecting to.
4.mitmproxy connect to the server, and then establish an SSL connection using the hostname indicated by the server name specified by the client.
5. The server responds with a matching SSL certificate that contains the common name (CN) and the server alternate name (SAN) required to generate the interception certificate.
6.mitmproxy generates the blocking certificate and then resumes the client SSL handshake that is paused in step 3rd.
7. The client sends the request through an already established SSL connection.
8.mitmproxy passes this request to the server through the SSL connection established in step 4th.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More