Use c # To write a socks proxy server, which is the most important part of the Protocol.

Source: Internet
Author: User
Tags rfc ftp protocol

Because I am a rough guy and a cainiao, I only want to know what the common people want to listen.

 

I don't know why the authors of the net mentioned that socks proxy is more complicated than http proxy. I have done both http proxy and socks proxy. It is obvious that http proxy is much more complicated than socks proxy, because the http proxy needs to parse the http protocol on its own, this is my http Proxy http://blog.csdn.net/laotse/archive/2010/09/24/5903651.aspx

In addition to the starting point, the socks proxy simply forwards the content whatever it is.

The http proxy I used failed to use took up of CPU usage. It should be due to the poor performance of parsing the http protocol header, and an endless loop or something else, so many http headers really don't want to analyze any more. I also sniffer ccproxy. The result is that it also has many incorrect resolutions. It has been difficult to parse ccproxy for many years, it means it is not easy to parse the http protocol.

Therefore, I suspect that the authors of the net have never done an http proxy before, so they can say that the http proxy is simpler than the socks proxy.

 

The socks proxy contains socks4 socks4a socks5 and sock4 sock4a sock5.

The tcp part of socks4 socks4a and socks5 is extremely simple.

This is another strange phenomenon. In this article, only one person said that udp is more complex than tcp. All others said that udp is simple, only the tcp part, and udp was taken over.

I obviously feel that udp is much more complicated than tcp.

 

Socks's rfc is rfc1928 and rfc1929. Let's just look at it. For example, when someone sees a book, when he reads the Directory and is going to look at the content later, he finds that it is already in the end, it turns out that this is not a "directory", it is content, and this is rfc1928. What does rfc do? If it is written like this, even if it is handed over, is there any Wang FA, or is there any law. Chinese translation is like machine translation, not human language. As a result, all the files on the net are connected to Mengjia, and there is no correct one. I have been reading those online files for the past two days, and I have used a ready-made program with socks and then sniffer. I know a rough idea, in fact, it is very easy to compile a program. The key is that the rfc protocol is ambiguous, so it is easy to understand.

 

 

Socks4 and socks4a only Proxy tcp.

Socks4

For example, ie can directly use socks proxy, but ie can only use sock4 proxy, even though I still use the latest ie9. For example, if ie wants to use sock4 to access Baidu, then ie will first use the local dns to convert Baidu into an ip address such as 202.108.22.5, and then send it to the sock4 server.

04 01 00 50 CA 6C 16 05 41 64 6D 69 6E 69 73 74 72 61 74 6F 72 00

1. the header 04 01 is fixed.

2. 00 50 is the port number of the ie site to be connected. Port 80 is used here, and port 00 50 is used in hexadecimal notation.

3. CA 6C 16 05 is the hexadecimal format of 202.108.22.5.

4. 41 64 6D 69 6E 69 73 74 72 61 74 6F 72 is the Administrator's acⅱ code

5. The last 00 is also fixed.

This is the fixed format of socks4 04 01 + Port 2 byte + ip4 byte + id + 00, where the id, here is the Administrator can or can not, because my account for Windows 7 is Administrator, I sent the account name to ie. I tried Firefox 4.0 and sent MOZ. Some of them ignored it.

04 01 00 50 CA 6C 16 05 00 so it can be sent like this. You can write anything as long as the last one is 0 and there is no 0 in the middle.

If the proxy server allows the proxy, for example, to verify the id, for example, if you want to allow only the id as administrator and allow the proxy, eight bytes will be sent back.

04 5A 00 50 CA 6C 16 05

As you can see, the difference is that the second byte is 5A and there is no 00 at last, so the proxy is allowed. In fact, we can change the other feedback byte here based on the protocol 5A, but we don't need to worry about it. If your proxy program does not allow him to use it, you can simply disconnect the connection in the program.

If the connection is allowed, all client data will be sent to the remote host, and the contents of the remote host will be sent back to the client, next, let me simply call exchange.

 

Socks4a

Careful people will find a problem. In sock4, the client such as ie uses the sock4 proxy to access Baidu, and ie will parse Baidu's ip202.108.22.5 in the local dns, instead of using the dns resolution of the machine on which the sock4 proxy is used, a problem occurs. For example, your machine is a restricted network dns and does not resolve the domain name to you, what should I do if I only allow you to access the network using ip addresses? No way, right? You know Baidu's ip address is useless. It exists in the form of domain names on the network. Do you know anything else. So why does ie need to be resolved on the local dns instead of the sock4 proxy machine? This is because of the sock4 protocol, otherwise it will not be called the sock4 protocol.

Therefore, the sock4a protocol is designed to address this problem. It allows the customer to send the domain name to the sock4a proxy machine and use the dns on the proxy machine to resolve the ip address, in this way, the client is not convenient to resolve domain names, and some websites, such as google, do not parse the domain name locally and get the ip address on the proxy server, the ip address obtained after the domain name is resolved on the proxy server is better. Generally, the connection speed is faster, as it is initiated from the proxy server.

Socks4a is like this. It intercepts sftp that I use flashfxp to log on to sourceforge through the socks4a proxy.

04 01 00 16 00 00 00 01 6C 61 6F 74 73 65 00 65 62 2E 73 6F 75 72 63 65 66 6F 72 67 65 2E 6E 65 74 00

1. 04 01 is fixed

2. 00 16 is the hexadecimal format of port 22.

3. 00 00 01 is the original IP address. The first three bytes must be 0, and the last one must not be 0, which is generally 1.

4. 6C 61 6F 74 73 65 is the ascii code of laotse. The name of the flashfxp proxy is the same as that of sock4.

5. 00 is fixed. It is the same as sock4. If the id of Step 4 does not exist, this 00 cannot be omitted.

6. 65 62 2E 73 6F 75 72 63 65 66 6F 72 67 65 2E 6E 65 74 is the ascii of the domain name web.sourceforge.net.

7, 00 fixed.

In this way, we can see that it is to replace the ip address of sock4 with a fake ip address like 00 00 00 xx, and then add the domain name + 00 after sock4.

In this way, the socks4a proxy first parses web.sourceforge.net into 216.34.181.70 (D8 22 B5 46), connects to 216.34.181.70, and then sends it to the client.

04 5A 00 16 D8 22 B5 46

Like sock4, it is still 04 5A + port + IP total 8 bytes.

Then, like sock4, it means forwarding. If the proxy is not allowed to be directly disconnected, no feedback is required.

Obviously, socks4a is better than socks4, so as to avoid problems caused by local domain name resolution. In other words, if you want to go to some unfriendly sites, you will always use local dns to resolve those domain names ...... Right. So do not use socks4 if you can use socks4a, but ie can only use sock4, ie9, and Firefox.

 

As you can see, socks4 and socks4a are so simple that they can no longer be simple.

 

 

Socks5

Compared with socks4 socks4a, socks5 Proxy has one more authentication function and udp proxy function.

The tcp proxy of socks5 is almost as simple as socks4 socks4a, But udp is a little more complicated, but it is not as complicated as http proxy.

First of all, verification requires a tcp connection for both tcp and udp. The verification process is the same as that for tcp and udp, there are several kinds of verifications. Don't worry. The others are not too popular. Those protocols have never heard of anything, and there is no experiment. If you trust me, just remember the three types.

If a customer wants to use the socks5 proxy, the customer will first send the following three types of content:

05 01 00 3 bytes in total, which requires anonymous proxy

05 01 02 a total of 3 bytes, which requires authentication by user name and password

05 02 00 02 4 bytes in total, which requires proxy by anonymous or user name and password

 

If the socks5 Proxy permits anonymity, the system will return two bytes, namely, 05 00. If verification is required, the system will return two bytes, namely, 05 02.

For example, if you want to authenticate the password, you must first verify the password. Because the authentication is only one step more than anonymous, and the password verification is the same in the future.

After the above socks5 returns 05 02 bytes

Client sends 01 06 6C 61 6F 74 73 65 06 36 36 38 38 38

1. 01 fixed

2. 06 this Byte indicates the length of the user name, indicating that the next six bytes are the user name.

3. 6C 61 6F 74 73 65. These six are the usernames and the ascii values of laotse.

4. one byte in another 06 indicates the password length, indicating that the next six byte is the password.

5, 36, 36, 36, 38, 38, these 6 are passwords, and the ascii value is 666888.

6. If there are still bytes after this, they will be ignored.

At this time, the socks5 Proxy verifies the user name laotse password 666888, right? If not, close the connection without feedback.

If the user name and password are passed, you can perform proxy, then send 01 00 to the client. The following is the same as anonymous. Anonymous is omitted.

In this case, no matter whether anonymous or a client that has passed password verification sends the following three methods to socks5:

Let's talk about tcp

First

05 01 00 03 13 77 65 62 2E 73 6F 75 72 63 65 66 6F 72 67 65 2E 6E 65 74 00 16

1. 05 fixed

2. 01 indicates tcp

3. 00 fixed

4. 03 it indicates that the domain name is followed by the IP address. The socks5 server performs dns resolution.

5, 13 indicates the domain name, then 0x13 (19 bytes) is the domain name character Length

6. 77 65 62 2E 73 6F 75 72 63 65 66 6F 72 67 65 2E 6E 65 74 These 19 are the ascii of the domain name web.sourceforge.net.

Port 7 and port 00 16, that is, port 22.

Second

05 01 00 01 CA 6C 16 05 00 50

1. 05 fixed

2. 01 indicates tcp

3. 00 fixed

4. 01 indicates the IP address.

5. CA 6C 16 05 is 202.108.22.5, Baidu ip

Port 6 and port 00 50, that is, port 80

Have you seen that tcp can locally resolve the ip address, and only allow the socks5 proxy to connect, or you can send a domain name to let socks5 use its dns to resolve the ip address and then connect

 

After the proxy server receives the above request, if it is a domain name, it first analyzes the ip address for connection. If it is an ip address, it uses a tcp connection to connect the ip address and port, if the connection to the remote host is successful, what will it send to the customer?

05 00 00 01 C0 A8 00 08 16 CE 10 bytes in total

Either of the two methods is as follows:

1, 05 00 00 01 fixed

2. The last eight bytes can be all 00, or the IP address and port used by the socks5 server to connect to the remote host. For example, Here C0 A8 00 08 is 192.168.0.8, this is the IP address of my socks5 server. Port 16 is port 5838, that is, the socks5 server uses port 5838 to connect to port 80 of Baidu. Or 05 00 00 01 00 00 00 00 00 00, only tell the customer that the connection is successful and do not tell him the details, but do not omit 0.

 

Forwarding between the client and the remote host is followed. Is it easy? It is much simpler than http proxy.

 

Then let's talk about udp. udp is much more complicated. First of all, let's talk about the principle. udp is different from tcp. It's not a single connection. The tcp negotiation part above is the same no matter tcp or udp, in addition, udp will occupy the socks5 proxy for a tcp connection to a udp.

The third type of udp

The client sends messages like qq (still sent in tcp)

05 03 00 01 00 00 00 00 E5 F0

1. 05 fixed

2. 03 indicates proxy udp

3. 00 fixed

4. 01 is fixed. You can only specify IP addresses that follow them.

5. 00 00 00 the client can send the customer's ip address such as qq, or it can all be 0, because this ip address is useless.

6. The most important part of E5 F0 is that the client, such as qq, shows to the socks5 Proxy the udp port to be opened. The port number is 58864.

How can I answer socks5? If you do not agree that the proxy can directly close the connection, you do not need to provide feedback. If you agree, socks5 should first prepare an ip address and a udp port. For example, I use the ip address 192.168.0.8 to open the 58865udp port for customer forwarding. Then return

05 00 00 01 C0 A8 00 08 E5 F1

1, 05 00 00 01 fixed

2. C0 A8 00 08 is ready to open the udp port to the customer's ip address. Here is 192.168.0.8. If multiple ip addresses are available, the ip address bound to the udp port is returned.

3. The E5 F1 returns a description of the port to be opened. The port number is 58865.

 

Well, the tcp negotiation is complete. Note that this tcp connection must not be closed. It must be always on. Although data will never be sent or accepted again, it must be always on. If this connection is closed, the customer considers the connection to be disconnected, because this is the socks5 protocol. Therefore, socsk5 forwarding udp not only occupies one udp but also one tcp connection. Please.

 

The preceding tcp port should not be closed. below is the data exchange between the customer and the two udp ports of socks5. Here, the problem of udp is also reflected, and it is connectionless, it is not like tcp because the tcp protocol ensures data reliability. For example, if the tcp connection qq sends abcdef and the data is too big to be sent at a time, it will be split, then socks5 may receive the second c third def such as AB. Although I do not know how much can be received at a time, it is still in the abcdef order as long as it is connected, there will be no data loss that the sender does not know. Udp is different. def may be received for the first time, and a may be received for the second time. bc may be lost and unknown. Therefore, ensure the integrity of udp data, you can't rely on the udp protocol. You have to manually specify this layer. In socks5, there is a concept of udp subcontracting, that is, specifying the first few bytes whether this is Package 1 or package 2, the socks5 program must sort it by itself. For example, if the first packet marked as No. 3 is received, store the package first and wait for the first packet to arrive at No. 2 and put the third packet to be followed, so it is very troublesome, And the rfc also said that the application should not try to get such subcontracting, and the rfc said, the socks5 program can choose to reject this subcontracting method and directly discard it without notifying the client, so since it is so troublesome, we do not need to implement this function, even if you implement this socks5 program, it is still unreliable for applications such as qq. However, udp-based applications have a function of integrity and sorting in their own applications, for example, if packet loss occurs, qq users will know about it, and qq users will find a way to resend or sort it, so we don't have to worry about subcontracting and let the application process Let's solve it by ourselves. We can only implement the forwarding without subcontracting.

 

 

Since only non-subcontracting is implemented, the format is fixed.

What does the client's port 58864udp send to the port 58865udp of socks5? ip + port or domain name + Port

00 00 00 01 70 5F F0 3C 1F 40 + Entity Data

For example, the four IP addresses start with 00 00 01 are IP addresses 70 5F F0 3C, 112.95.240.60, and 1F 40, namely Port 8000. All the data behind them is physical data. Then, the socks5 server uses the 58865udp port to send the following Entity Data to port 8000 of the qq server 112.95.240.60 instead of the preceding encapsulated content, and the qq server will return the data of the 58865udp port, the returned data is real data, because the proxy is like the machine socks5 is using qq, so the received data is not encapsulated in front of it. Then socks5 will be returned to the client and cannot be returned directly. You need to wrap it.

00 00 00 01 70 5F F0 3C 1F 40 + receive data from remote host

Return this to the client's 58864udp Port

Is the content in the preceding packaging the same? Yes, it is the same, because the client has already specified the ip address, so it must be the same.

 

Another type is the domain name. The 58864udp port of qq is sent to the 58865udp port of socks5.
00 00 00 03 12 67 72 6F 75 70 63 6C 69 65 6E 74 2E 71 71 2E 63 6F 6D 23 29 + Entity Data

00 00 00 03 The Beginning of the description is followed by the domain name, followed by 12 instructions followed by 0x12 (decimal 18) bytes is the domain name, the solution is groupclient.qq.com

The following 23 29 is port 9001. Then, the socks5 server must first send the groupclient.qq.com ip address to the dns server 58.251.62.164 (3A FB 3E A4) and send the Entity Data to 9001udp of 58.251.62.164 through the 58865up port, the returned result is sent to the client qq's 58864udp as above.

00 00 00 01 3A FB 3E A4 23 29 + receive the data returned by the remote host. 03 changed to 01 and the ip address + port was used directly.

 

No. After the customer negotiates with socks5tcp, The udp port opened by socks5 not only contacts the udp port opened by the customer, but also the remote host to be connected, it is a bit messy because a port is used, so we have to judge that if we find that the ip address for sending data such as 58864udp and the udp port are the one we negotiated earlier, it indicates the customer's data, in this case, we need to record the ip address and port of the remote host to be sent by the customer, such as the 9001udp port of 58.251.63794 in the preceding example. If the data is sent from 9001udp of 58.251.63794, this means that the remote host sends back the data, and the data needs to be forwarded to the negotiated customer. In another case, the data sent by the customer is neither in the remote host list nor in the remote host list, for example, if the customer sends A piece of data to port a of remote A, the customer sends the data to receive the data and returns the data to the customer, to send this request to port B of remote B, send it to receive it and return it to the customer. Then, the remote host will have a list. Now there are 2 records, only The udp to be received must be forwarded to the customer in these two records. If the customer wants to send the udp to C again, there will be three records in the list, which may be four or five.

 

 

As you can see, the socks5 Proxy udp is very troublesome compared to tcp. It not only takes up a tcp to maintain the connection, but also needs to manually create this list, however, it is much easier to parse http than to write an http proxy.

 

Socks5 also has a bind tcp method, that is, there is an active mode in the ftp protocol that is a tcp connection to the ftp server 21, after negotiation, a port of the server will actively connect to a port of the client. It seems to have been seen long ago. This mode is useless now. What is the purpose of negotiation between the ftp server and the host, the current machine is either behind the firewall or on the LAN. How can I connect the ftp server to the customer through reverse connections, therefore, the current ftp almost uses the passive mode in which the customer actively connects to the ftp server. The bind of socks5 is suitable for the old active mode, which is very useful, so we will not consider it.

 

 

Ie9 can only use sock4, 4a, and 5

Firefox 4 can use sock4 or 4a, but it can use sock5, but it is half-left. It does not support user name and password verification, in sock5, I mentioned above that I could send a domain name to let the sock5 proxy go to dns resolution, but Firefox 4 had to resolve the domain name locally and only used the ip address mode of sock5, it cannot be completely hidden. If you want to go to some inharmonious websites, if someone finds that you are always parsing those domain names in the local dns ...... Right. Rfc calls this dns leakage.

 

 

How can we forward data between a client and a remote host?

Look at the article I wrote. It's very easy to have code! Http://blog.csdn.net/laotse/archive/2010/09/10/5874778.aspx

 

 

How can I implement timeout?

In my opinion, for tcp proxies, whether 4 4a or 5, just disconnect the two tcp connections between the remote and the customer. For udp 5, if the udp port is not sent by the customer within five minutes, close the udp port and disconnect the tcp port maintained by the customer. No matter whether it is 4 4a or 5, as long as the customer actively disconnects, all resources open for the customer will be closed. Don't worry. Generally, the program will regularly send the maintenance connection information, and it will not be so useless without sending or receiving the connection. Therefore, if there is no data in five minutes, you can think that the connection has been disconnected, just close it.

This problem is very serious. If you don't do it for a while, it will fill up all the ports on the server. In my personal experience, the server will be disconnected early, but it will not be released. In the past few days, ESTABLISHED will be displayed, it will not be automatically closed until 4294967295 seconds, so this issue should be careful and conservative.

 

 

The socks proxy is still in plain text, and the first few bytes are followed by plain text, which cannot be truly hidden. The socks proxy does not support encrypted connections. So I did this. When a zombie reads an ip address from a fixed website on the network, it connects to me and is encrypted with rsa + aes, therefore, an encrypted tunnel is established between a port of mine and a port of my zombie. programs such as ie use this encrypted tunnel to connect to the socks proxy server there, connect to the network server through the socks proxy. In this way, the connection process is absolutely secure, and I wrote the encryption bits and key Initialization vectors set by aes in the reverse connection server, which are the largest and most abnormal ones. In the past, I used to put the http proxy on a zombie and access the Internet through this encrypted tunnel. However, since the http proxy can only be used to browse webpages through the http protocol, I failed to compare the http proxy, and the endless loop is often caused by unknown reasons. Now, all of them can be used. The application may use sock4 for sock4 and 5 for sock5 for different purposes. The multi-thread connection does not affect each other, and all of them are encrypted tunnels through anti-connection, in addition, there is no need to analyze the High-Level tcp protocol, only forwarding speed is fast, and there will be no endless loops due to the logic error of the parsing protocol. It is good to try it for two days, the cpu usage is about 0, and the memory size is about 16 Mb.

 

 

I forgot to mention the important point. I just used qq as an example to illustrate the process of proxy udp. I don't need qq, after seeing my article, some comrades sent an email to me and said, "I want to exchange qq: xxxxxx". I don't need qq. I can't open qq twice or three times a year, I started qq for testing only when I was working on this socks program. I chatted on forums. If you want to communicate, or send an email, you 'd better post on my website.

 

Http://blog.csdn.net/laotse/article/details/6296573

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.