TCP hole-hitting Technology (P2P)

Source: Internet
Author: User

Reprinted: file: // C:/Documents % 20and % 20 settings/Administrator/% E6 % a1 % 8C % E9 % 9d % A2/login

The establishment of a P2P connection through a NAT device is only a little more complex than UDP. The TCP "hitting" is very similar to the UDP "hitting" process at the protocol layer. However, TCP-based holes have not been well understood so far, which leads to not many NAT devices. With the support of NAT devices, TCP-based "holes" Technology is as fast and reliable as UDP-based "holes" technology. In fact, as long as the NAT device supports it, the TCP-based P2P technology is more robust than the UDP-based technology, because the state machine of the TCP protocol provides a standard method to accurately obtain the lifetime of a TCP session, UDP cannot.

 

I. Reuse of sockets and TCP ports

In the process of implementing P2P "drilling" based on the TCP protocol, the main problem is not from the TCP protocol, but from the application's API interface. This is because the standard Berkeley Socket API is designed around the construction of client/server programs. The API allows TCP stream sockets to establish external connections by calling the connect () function, you can also use the listen () and accept functions to accept external connections. However, the API does not provide Connections similar to UDP. The same port can be connected both externally and externally. What's worse, TCP sockets usually only allow one-to-one response, that is, after an application binds a socket to a local port, any attempt to bind the second socket to this port will fail.

To make TCP "hitting" work smoothly, we need to use a local TCP port to listen for TCP connections from outside and establish multiple external TCP connections at the same time. Fortunately, all mainstream operating systems support special TCP socket parameters, usually called "so_reuseaddr ", this parameter allows the application to bind multiple sockets to a local endpoint (as long as the so_reuseaddr parameter is set for all sockets to be bound ). The so_reuseport parameter is introduced in the BSD system to distinguish between port reuse and address reuse. In such a system, all the above parameters must be set.

 

2. Open the P2P TCP stream

Assume that client a wants to establish a TCP connection with client B. We generally assume that A and B have established TCP connections with known servers on the Internet. The server records the Internet and Intranet endpoints of each connected client, which is the same as the UDP Service. From the protocol layer, TCP and UDP are almost identical.

1. Client A uses its connection with server s to send a request to the server, asking server s to help it connect to client B.
2. s returns the TCP endpoint of B's Internet and Intranet to A. At the same time, s sends the Internet and Intranet endpoint of A to B.
3. Client A and client B use the port connecting s to initiate a TCP connection to the peer's public network and Intranet endpoint asynchronously, and listen on whether their local TCP ports have external connections.
4. A and B start to wait for the external connection to check whether there is a new connection. If an external connection fails due to a network error, for example, "the connection is reset" or "the node cannot be accessed", the client only needs to delay for a short period of time (for example, one second ), then re-initiate the connection. The latency and the number of reconnections can be determined by the application writer.
5. After the TCP connection is established, authentication should be performed between clients to ensure that the currently connected connection is the desired connection. If authentication fails, the client closes the connection and continues waiting for the new connection to join. Generally, the client adopts the "first-in-first" policy. It only accepts the first client that passes authentication, and then enters the P2P communication process and no longer waits for new connections to be connected.

 

Unlike UDP, each client using UDP only needs one socket to complete the task of communicating with server s and multiple P2P clients at the same time, the tcp client must handle the problem of binding multiple sockets to the same local TCP port ,.

Now let's look at a more practical scenario. A and B are located behind different NAT devices, and assume that the port number is the TCP port number, rather than the UDP port number. When the client initiates a connection to each other's public network endpoint, the NAT device opens a new "hole" to allow the TCP data of A and B to pass through. If the NAT device supports the TCP "hole" operation, a TCP-based channel between clients is automatically established. If the first SYN Packet sent by A to B is sent to the NAT device of B, and B does not send the SYN packet to a before, the NAT device of B discards the packet, this will cause a "connection failure" or "connection failure" problem. At this time, because a has sent a SYN packet to B, the SYN Packet sent by B to a will be considered as part of the response from A to B, therefore, the SYN Packet sent by B to a will smoothly reach a through a's NAT device, thus establishing a P2P connection between A and B.

3. TCP "holes" from the application perspective"

From the application point of view, what happened during TCP "drilling? Assume that a first sends a SYN packet to B, which is sent to B's Internet endpoint and is discarded by B's NAT device, however, the SYN Packet sent by B to a's public network endpoint reaches a through a's Nat. Then, one of the following two results will occur, which of the following depends on the implementation of the TCP protocol by the operating system:

(1) TCP of a will find that the SYN packet received in advance is the SYN Packet of B that it initiates a connection and wants to join. In other words, it means "Cao, Cao, originally, a was going to look for B. As a result, B came to the door. The TCP protocol stack of A therefore uses B as part of a's connection to B and considers the connection successful. The asynchronous connect () function called by program A will be returned successfully, and the listen () function of program a waiting for external join will not be reflected. In this case, the operation of B connecting to a is considered as a successful connection to B in program a, and a starts to use this connection to start P2P communication with B.

Since the received SYN Packet does not contain the ACK data required by a, TCP of a will respond to the Internet endpoint of B with the SYN-ACK package, in addition, the serial number of the SYN Packet sent from A to B will be used. Once TCP of B receives the SYN-ACK package from a, it sends its ack package to A, and then establishes a TCP connection between the two ends. To put it simply, the first one is that even if the SYN Packet sent by A to B is discarded by the NAT packet of B, the packet sent by B to a reaches. As a result, a thinks that the connection is successful, and B thinks that the connection is successful. No matter who is successful, the connection is established.

(2) Another result is that the TCP implementation of A is not as "intelligent" as described in (1), and it does not find that B is the one that you want to join. Just like picking up at the airport, I met people I wanted to pick up, but I didn't know them. I mistakenly thought I was another person and arranged for someone else to pick up. Then I realized that I missed the opportunity, however, in any case, the person has received the task and completed it. Then, a connects to B through the regular listen () function and accept () function, and the connection initiated by A to B's public network endpoint will end in failure. Even if the connection from A to B fails, a still obtains the connection from B to A, which is equivalent to the connection between A and B, regardless of the intermediate process, A and B have been connected, and the result is that A and B have established a TCP-based P2P connection.

The first result is applicable to the TCP implementation of the BSD-based operating system, while the second result is more common. Most Linux and Windows systems will follow the second result.

 

 

The following is the non-reprinted part. My thoughts are as follows: (if there are no special statements, they are all in Windows)

 

This so-called "hole" is socket, socket.

 

Port multiplexing means that both listen () and connect () can be performed on a socket ().

 

In my opinion, the so-called port has two forms: active connection and passive connection. that is to say, when I connect to a port of a server, I will also enable a port for communication. we generally know the port of the passive connection, and the port of the active connection is randomly allocated by the system. if you don't believe it, you can open a webpage and enter "netstat-an" in the CMD window. You will find one or more information, this means that the local port XXX is connected to the remote port 80 (Web Service. however, the actual situation is more complex. The local port cannot be accessed externally, because the data sent from the local port must be routed, after the analysis package, the route knows that a connection is established, so the route opens a port to receive external packets, so your own port (intranet port) it will form a ing with the opened port (Internet port.

 

"Punch holes", that is:

Two Computers A and B are connected to servers S, A, and B. At this time, there are two sockets, and a and S are regarded as one (s_as ), B And s calculate one (s_bs), both of which are configured with port multiplexing.

A and B respectively create a new socket (set port multiplexing) and call listen (), that is, a creates a new socket and binds it to the s_as port for listening (intranet port), and B is the same

A and B obtain the Internet IP address and port number of the other party through the server respectively (the server obtains, actively connects, and routes open ports)

Tell the server that we (A and B) are ready. Then the server sends a command to prepare P2P.

In this case, A and B respectively connect to the other party, that is, a connects to the Internet port of s_bs, and B connects to the Internet port of s_as. At least one party can connect to the other and establish a connection.

(Listen to your intranet port and connect to the internet port of the other party)

 

For more information about the connection, see the description marked in red in the above article.

 

Anyone who has worked on network programming should know that TCP is reliable because it has to go through a "three-way handshake" when establishing a connection (of course there are other factors). Say hello in advance, these three data packets are called syn, ack, and RST respectively (if you are wrong, do not blame me). Because A and B are both on the Intranet, they are separated by two routers (one on your own, when a connection is established, the SYN Packet sent by itself can pass through its own route, but cannot pass through the route of the other party. Therefore, the packet is discarded, at this time, the other party also sends a SYN Packet because it has already sent the SYN packet, so the route on its side will think that the SYN Packet sent by the other party is a response to my Syn packet, therefore, the packet will not be discarded, and the packet sent by the other party will be sent as a response to its own package, so the processing (of course not a normal three-way handshake, SYN and Syn), and send ACK to the other party, then the other Party sends me the RST package, and then establish a connection ......

 

I want to think about it. I think it's a router that is being fooled, and a connection is established.

 

There is only one final result:

A connects s_bs or B connects s_as. In short, A and B are connected.

It seems that none of them are connected.

 

I have read this information, but I don't know if it's correct. I have time to try it out by coding ~~~

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.