1. What is P2P
Peer-to-peer (P2P) is also called a peering connection. It is a new communication mode. Each participant has the same ability to initiate a communication session.
This definition is a bit abstract. Here is a simple explanation. Roughly speaking, applications can be designed to adopt a client/server architecture or a peer-to-peer architecture (P2P ). Many applications in our daily life, including web, email, and DNS, use the client/server architecture, and file delivery, such as the familiar thunder download, p2P file distribution technology is used to treat the architecture (P2P ).
For the client/server architecture, it must always open the infrastructure server. On the contrary, using the P2P architecture, there is a minimum (or no) dependency on the always-open infrastructure server. Host pairs with any intermittent connection are called Peer-to-Peer, and each peer communicates directly. The peer is not owned by the service provider, but a device controlled by the user.
Ii. P2P file distribution
Next we will study P2P through a specific application, which distributes large files from a single server to a large number of hosts (peers.
In Client/Server File distribution, the server must send a copy of the file to each client, which at the same time puts a great burden on the server and consumes a lot of server bandwidth. In P2P file delivery, each peer (that is, the client in the corresponding client/server architecture) can re-distribute any part of the file to assist the server in distributing the file.
1. Client/Server Architecture VS P2P Architecture
First, assume that the file length is F, the server upload speed is U, the download speed is D, and the client has N, the upload speed of each server is ui (I = 1, 2... n), the download speed of each server is di (I = 1, 2... N ).
Each file distribution involves uploading files from the server and downloading files from the client (or peer. In the following discussion, we assume that F, U, D, ui, di remain unchanged, while N, that is, the number of peering parties is variable.
First, for the client/server architecture, the time required for the server to upload N files (because there are N Customers, each customer has a copy of the file) is at least NF/U. The peer with the minimum download rate (expressed in dmin) cannot obtain all the F bits of the file within F/dmin seconds. Therefore, the time required to distribute files using the client/server architecture is
Dcs = max {NF/U, F/dmin}
That is, the minimum time required is determined by the maximum download time of the file and the large person in the file to be uploaded. In fact, this is a natural issue because of the distribution time, if the server uploads the N files, the server will upload the N files, or the peer will download the N files. However, we can see that NF/U will linearly increase with N increase, while F/dmin is a constant value. That is to say, when N reaches a certain degree, it must be greater than F/dmin, and it becomes the value of Dcs, that is, Dcs = NF/U.
Then, for the P2P architecture, each peer can help the server to distribute files. That is to say, when a peer receives file data, it can use its upload capability to re-distribute the data to other peers.
At the beginning of the distribution, only the server has files. In order for the Peer to obtain the file, the server must send the file at least once through its access link. Therefore, the minimum distribution time is at least F/U. In the P2P architecture, the server may not need to send a file once, because other peers can obtain the file from the peer that owns the file.
In the same architecture as the client/server, it is impossible for the peer with the minimum download rate to obtain all bits of file F within F/dmin seconds. Therefore, the minimum distribution time may be F/dmin.
Finally, the system's total upload capability is equal to the server's upload rate plus the upload rate of each peer, that is, Utotal = U + u1 + u2.... + uN. The system must deliver (upload) bits to all N peers, so the total delivery is NF bits. Therefore, the minimum distribution time is at least NF/(U + u1 + u2. .. + uN ).
To sum up, the time required to use the P2P architecture to distribute documents is
Dp2p = max {F/U, F/dmin, NF/(U + u1 + u2. .. + uN )}
That is, the minimum distribution time is determined by the upload time of the server, the maximum download time of the peer, And the upload and download time of all treated users. Similarly, because F/U and F/dmin are constants, when N reaches a certain value, NF/(U + u1 + u2... + uN) will be later than the previous two, the time required to become the distribution file, that is, Dp2p = NF/(U + u1 + u2... + uN ). From the expression, we can see that when the value of N increases, U + u1 + u2... + The value of uN increases accordingly, so the distribution time of a function does not linearly increase as in the client/server architecture. Its Curves and logarithm functions (such as log2N). Therefore, when the value of N is large, it takes much less time for the P2P architecture to distribute files than the client/server architecture.
2. BitTorrent-popular P2P protocol for file distribution
The preceding mathematical method illustrates the time difference between the client/server architecture and the file distribution based on the P2P architecture. The following describes how the P2P file distribution is implemented. The following uses the BitTorrent protocol as an example.
In BitTorrent, the set of all peers involved in a specific file distribution is called a torrent. In a flood, the peer downloads an equal-length file block, which is usually kb. When a peer begins to add a torrent, there is no file block. But over time, it will accumulate into more and more file blocks. When it downloads a file block, it also uploads multiple file blocks for other peers. Once the recipient obtains the entire file, it can leave the torrent or stay in the torrent to upload file blocks for other peers. At the same time, any peer can leave the torrent at any time, or re-join the torrent in the future.
There are two problems here: 1) how do we know what Peers it has when our host or device is added to a flood, that is, how does it know the files it needs to request to which hosts. 2) When downloading an object, if we determine which part of the file is required, in other words, the object consists of many parts, however, we do not download the file in the original order, so how can I determine which parts I need to download to make the file complete.
First, answer the first question. Each torrent has an infrastructure node called a tracker. When a peer adds a flood, it registers with the tracker and periodically notifies the tracker that it is still in the flood. A specific flood may have hundreds or thousands of peers at any time. When A new peer A adds A flood, the tracker randomly selects some peers from the peer collection and sends the IP addresses of these peers to, A holds the peer's list and tries to create A parallel TCP connection with the peer on the list. The peer who successfully creates a tcp connection with A is called the "neighboring peer ". Some of them may leave over time, while others may try to create TCP connections with A, as A did before. In this way, we can know the peers in the flood where the files to be downloaded are located.
To answer the second question, each peer has a subset of a certain file block at any time, and different peers have different file block subsets. A periodically queries the list of blocks of each neighboring peer and obtains the list of blocks of its neighbors. Therefore, A will send A request to the blocks that it does not currently have. At the same time, because each peer in the flood is both downloading and uploading, A should also decide which neighbors the requested block should be sent. Usually, in the process of request blocks, a technology called the least common priority is used, that is, the most rare block is determined from the neighbor of A Based on the block that A does not have (that is, those blocks that copy the least number in its neighbor ), request the most rare blocks first. The purpose of doing so is to make the number of copies of each block in the torrent roughly equal, which can also increase the total download rate because the download will not be stuck in the download of a file block.
Iii. P2P region search information
Another important application of P2P is the information index, that is, the ing of information to the host location.
To illustrate what an index is, for example, a P2P file sharing system contains an index that dynamically tracks the files that can be shared by these peers. This index maintains a record that maps information about the copy to an IP address with the copy peer. When a peer joins the system, it notifies the System of the file indexes it owns. When a user wants to obtain a file, he searches for the index to locate the Copy location of the file.
Note: there are still some differences between P2P file distribution and P2P file sharing. P2P file sharing may occur in different periods, such as the files received now, upload only after 1 hour. P2P file sharing may also occur in different files. For example, you need to download file A, but provide file B to other users. P2P file delivery is more targeted at a single file and provides upload services to other users at the same time of download. This is a collaborative process.
1. centralized indexing
An index service is provided by a large server (or server farm. When a user starts a P2P file sharing application, the application notifies the Indexing Server of its IP address and available file names. The indexing server collects shared objects and establishes a centralized dynamic database (ing object names to IP addresses ).
It has the following Disadvantages:
1. single point of failure
2. performance bottleneck
3. Poor Reliability
This indexing method is characterized by scattered file transfers (P2P), but the content locating process is highly centralized (Client/Server ).
2. query flood
The query flood uses a completely distributed method. indexes are fully distributed in peer-to-peer regions. The peering square is an abstract logical network called a Coverage Network. When A wants to locate the index (such as abc), it sends A query packet (including the keyword abc) to all its neighbors ). All the neighbors of A forward the text to all their neighbors, and the neighbors then forward the text to all their neighbors. If one of the peering master and index (abc) configurations is configured, a query hit message is returned.
However, this simple method has a fatal disadvantage, that is, it will generate a large amount of traffic.
One solution is to use a restricted range query flood. Set a Count value. Before the peer requests are forwarded to the peer, the Count field of the peer is reduced by 1. When the Count field of a peer is 0, the query is stopped.
3. Hierarchical coverage
This method combines the advantages of centralized indexing and query flood. Similar to query flood, hierarchical overwriting design does not use dedicated servers to track and index files. The difference is that not all peers are equal in hierarchical coverage.
It is as follows:
The Super Peer (team lead peer) maintains an index that includes its sub-peer (Common peer) the identifier of all files being shared, the metadata of the files, and the IP address of the sub-peer that maintains the files. The Super Peer is often just a common peer. The Super Peer establishes a TCP connection to each other to form a Coverage Network. The Super Peer can trust the Super Peer to forward the query, but the query flood is restricted only for the Super Peer.
When a peer performs an index, it sends a query with a keyword to the Super Peer. The Super Peer responds with the IP address of the sub-peer with the relevant file. The Super Peer may also forward the query to one or more adjacent super peers. If an adjacent peer receives such a request, it also responds with the IP address of the child peer with a matching file.
Compared with the restricted query flood design, the hierarchical coverage design allows a large number of peers to check and match, without generating excessive query traffic.
Through the introduction of these two P2P common applications, you should have some understanding of P2P!