Two years ago, I used. NET 2.0 as a reverse proxy server. Over the past two years, I have been constantly modifying bugs and optimizing performance, greatly improving availability. I have encountered a functional requirement recently and cannot find an effective solution. I have to come and consult with you.
Let's talk about the working mechanism of reverse proxy first.
1. When the client accesses the reverse proxy through a browser, an HTTP request is sent. When the reverse proxy receives the TCP connection, create a new session to process this request (beginaccept, endaccept );
2. The session object establishes a delegate for receiving data from the client and starts to asynchronously read data (beginread );
3. When data is obtained, it enters the callback function for asynchronous reading and starts to process data (endread );
4. Check whether the connection between the reverse proxy and the server has been established. If no connection is established, you must first establish a connection (connectserver) and establish an Asynchronous Server read delegate (beginread );
5. Write Data to the server asynchronously (beginwrite );
6. Recreate the client asynchronous read delegate (beginread) and return to 3;
7. When receiving the data returned by the server, the data is asynchronously written to the client (beginwrite) after processing );
8. Re-establish the server asynchronous read delegate (beginread) and return to 7;
All data transmission is completed asynchronously, but data processing needs to be written for the business at 3 and 7.CodeYou can.
In fact, for reverse proxy, you only need to process the data sent from the client. You need to replace the HTTP Host header with the actual server, and for the server response data, you only need to send it to the client as it is.
In step 3, we only know that the data sent from the client is received, but we do not know whether the data is an HTTP request header or a complete HTTP request header. Fortunately, for reverse proxy, you do not need to worry about whether it is a complete HTTP Request Header. You only need to check whether it is an HTTP request header. If yes, modify the host. Here, I assume that the first packet in the HTTP request must be an independent packet and will not "stick" to the end of the last data packet in the TCP connection, in this way, you can directly use the format specified by the HTTP protocol to check whether the packet is an HTTP request header. Although this assumption has no basis, it is indeed very effective.
ProgramAfter two years of work, there is no problem.
However, the problem arises. There is a requirement that you can replace a string on the page returned by the server with the specified string. For example, if I use a reverse proxy to point to the blog garden, I need to change all the connections using the absolute path on the blog garden page to the connections pointing to the reverse proxy server. This requires processing data in Step 7, converting the data into a string, replacing the link, and then sending it to the client.
However, in step 7, the data received each time is only a clip, not the HTML of the entire page. Even if we assume that the first packet in the HTTP response is an independent packet, we can only identify the response header and data body. I also thought about converting each piece of data into a string for processing. But what if a character is split into two TCP packets by the network layer? In addition, if you want to use gzip in the blog garden, you cannot decompress the data on the entire page if you do not accept it. Even if neither of the two situations exists, what should I do if the network layer splits data packets in the place of a hyperlink?
Therefore, the most conservative approach is to get the whole page data and start processing. I also thought that there is a Content-Length in the HTTP Response Header that specifies the content length, but in reality, many responses simply do not have this segment.
I have checked the httplistener class and the httplistenerrequest class and tried to find out how it received a request (response). Unfortunately, these two classes call a large number of nativeapis, so I cannot know.
Also, how does a browser know whether a response has been completed?
Please kindly advise!
This agent has been put on codeplex, you are interested to see: http://www.codeplex.com/XProxy/
Also, don't forget to give me a solution to the problem. ^_^
Thank you!
QQ: 99363590
E-mail: nnhy # vip.qq.com
QQ: 10193406