Wrote an HTTP high stealth agent

Source: Internet
Author: User

This thought that the writing HTTP proxy and the previous port forwarding is similar, the result actually a writing up found to be more complex. What's going on is that you have to parse the HTTP protocol manually.

Say simply, if use IE last website, use Sniffe to look at HTTP request header is this.

get/http/1.1
Accept:application/x-ms-application, Image/jpeg, Application/xaml+xml, Image/gif, Image/pjpeg, application/ X-MS-XBAP, Application/x-shockwave-flash, Application/qvod, Application/qvod, */*
Accept-language:zh-cn
user-agent:mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; trident/4.0; SLCC2. NET CLR 2.0.50727;. NET CLR 3.5.30729;. NET CLR 3.0.30729; Media Center PC 6.0;. net4.0c;. NET4.0E)
Accept-encoding:gzip, deflate
Host:www.microsoft.com
Connection:keep-alive
Cookie:xxxxxxxxx

But if you use a proxy, it becomes like this.

Get http://www.microsoft.com/HTTP/1.1
Accept:application/x-ms-application, Image/jpeg, Application/xaml+xml, Image/gif, Image/pjpeg, application/ X-MS-XBAP, Application/x-shockwave-flash, Application/qvod, Application/qvod, */*
Accept-language:zh-cn
user-agent:mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; trident/4.0; SLCC2. NET CLR 2.0.50727;. NET CLR 3.5.30729;. NET CLR 3.0.30729; Media Center PC 6.0;. net4.0c;. NET4.0E)
Accept-encoding:gzip, deflate
Host:www.microsoft.com
Proxy-connection:keep-alive
Cookie:xxxxxxxxx

The difference is here, with the agent get that place will write the full URL, and connection plus the proxy flag, the other same. So with TcpListener and tcpclient each receive a connection, we must first rewrite the header of the submitted HTTP request, that is, change the following to the above.

This is a get method, and only the header part of the request has no entity part.

There is also a post method, which contains the entity parts, such as uploading pictures or whatever, is the post method used. The Post method is immediately after the head section.

How to judge which is the head that is the entity.

HTTP protocol header must have 2 consecutive "/r/n", just like the above cookie followed by 2/r/n, so read the request header as long as read the/r/n/r/n, then the front is the head, followed by the entity. The entity size has a content-length tag above it. So after reading the content-length size from behind the/r/n/r/n, it's over.

There is a connect method, which is used to connect is the SSL encryption communication, when you receive the connect urs.microsoft.com:443 http/1.0 such requests, the proxy server to customers (such as IE) return a "http/1.1 Connection established/r/n/r/n ", then tcpclinet a server of 443, after only responsible for customer and server forwarding on it, like the forwarding of a post, nothing. This is the simplest.

In the above 3 kinds of most commonly used.

The other request methods are put option, because they have never been seen, and do not know where to go to try to follow the get post method of processing.

Server return more trouble, the trouble is that the HTTP protocol is too loose, if each response or request, including content-length or chunked and so on to indicate the size of the entity is good to judge, the HTTP protocol to determine the size of the method there are several ways, Of course, the most accurate is that there are content-length and chunked, and the server disconnected to judge, some of the responses did not content-length or chunked, to determine when the disconnect to judge, suspected that the network up and down bad files is caused by this, Customers do not know how large, if read the server disconnected so no problem, if read read the network interruption, the customer thought is the server disconnected is it.

So when you read the server response, you have to judge several values.

1, Judge the status Code, HTTP protocol 1xx 204 304 certainly does not include entities, so read the/r/n/r/n no longer read

2. No content-length in judgment

3, judge whether there is chunked

If there is a content-length, then read the same as the above request header,/r/n/r/n read content-length back to the customer.

There is a chunked encoding, this encoding is generally gzip compressed, Microsoft Forum is used in this, when you request the page, the server side of the page gzip compression to you and then compress a point to pass to you, so the beginning can not get content-length, But each chunked has a marked size.

http/1.1 OK
Cache-control:private
content-type:text/html; Charset=utf-8
server:microsoft-iis/7.5
x-aspnetmvc-version:2.0
x-aspnet-version:4.0.30319
Set-cookie:set-cookie:x-powered-by:asp.net
P3p:cp=all IND DSP COR ADM cono CUR cuso ivao ivdo PSA PSD TAI telo our SAMo CNT COM INT NAV ONL PHY PRE PUR-UNI
Server:co1vb06
Date:fri, Sep 09:33:28 GMT
ntcoent-length:166137
Content-encoding:gzip
Transfer-encoding:chunked

2d23
...........}. S.g.......j....*u......y....% ...; QO. M.[..... 3.,...
.!.. O.... H. " V.. >.............=yy

Like the top chunked/r/n/r/n, the first line of 2D23 is the size of a chunk, so read the 2d23 after 2d23/r/n and then follow the/r/n, followed by the size of the next chunk, until the last chunk is 0 size. The end of the entity, and finally a/r/n. That is to say chunked the last 7 must be/r/n0/r/n/r/n, originally judged read to/r/n0/r/n/r/n end should be no problem, but for the sake of insurance, or one time read the size reread size.

The most annoying is neither content-length nor chunked, if the return is conntion:close good point, read read to find that there is broken on the line, if the return of the keep-alive, Networkstream.read There is stuck, performance in IE is seemingly the page is loaded, but the progress bar is still slowly walking, so can only add a read timeout, such as 3 seconds still can't read out on the disconnect. Instead, ie shows "done",

And if the analysis of keep-alive that would be too troublesome, I was from the server where once read, whether it is not keep-alive all close the connection, also said IE each request is separate TcpClient once the server is closed.

But the processing of IE can not be so, ie each port and proxy server connection sent to send a request is one or more, so tcplistener every in an IE tcpclient (that is, ip+ port), processing the request after this tcpclient can not disconnect as disconnected from the server , IE does nothing to show the progress bar or the server could not be found. So after processing a request, you loop through the next request for this tcpclient, and if you find the request disconnected, close the tcpclient completely. So the whole process is like this.

1, TcpListener monitoring

2. Circulation tcplistener.accepttcpclient ()

3, come in a TcpClient (), start a thread processing, the above continue to loop wait

4, at the same time 3 of the TcpClient began to deal with, read his HTTP request header, overwrite the HTTP request header, and then send the overwritten request header and the following entity part to the requesting server, note here must be with the read with the change with send, can't wait until full read to send again, otherwise it will time out.

5, send finished, start receiving from the server

6, and the 4th almost, but also from the server with read to IE sent, but also can not read the resend or overtime

7, read finished, disconnect and the server connection, whether it is not keep-alive

8, repeat to the beginning of the 4th step, and then read the next request from IE, if there is then the implementation of 5/6/7/8, until the discovery of IE this tcpclient disconnected, the end of the thread completely off.

Basically this is the way, so put the above conditions in code to write is the HTTP proxy server, it's troublesome to write the above conditions in code, so the code is very ugly, and the code I wrote was so hard to read that I couldn't see it, so I didn't shortcoming The key is to explain that the general process is more important than the code, at that time I find this process other people's article explain are not very clear, read a few pages of HTTP protocol documents, should be machine translation, it is difficult to understand, a total of 100 dozens of pages, see All is not worth, there is an introduction c#2003 do agent articles, A look at the root is the port forwarding did not change the request head, a try really not, there is a foreigner, very long do not want to see, That style is like the kind of feeling that you see in a. NET class library. Sit a piece of the right piece, and after the operation found also not very good, so had to 1.1 point of the pull, but fortunately, after compiling the effect of the operation is very well, testing the afternoon, CPU occupancy rate of not more than 3%, memory occupies 10 trillion, Download what SSL can be, only the occasional progress bar on the internet to wait for the situation, is said above because the server there is no entity length information waiting for the timeout, but this has nothing to do. Most of the sites are the same as the direct use of IE brush out, feel less slow. Download more no problem, and directly with IE download speed the same.

Finally is why say is high hide, originally want to check the principle of the agent part, and want to first try what is the result, this thought is transparent agent, the results of several check agent anonymous site, the results are all said to be "high hide", about anonymity, The code is only to send the proxy-connection ie sent to the connection, is this "high hide" the. That's a tall nickname. Yes, PicasaWeb could not have opened the door, I used this agent will be able to open.

We are interested to download to test the

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.