BitTorrent protocol specification)
Original BitTorrent protocol specification (English)
BitTorrent is a file distribution protocol. It identifies content through URLs and can interact seamlessly with the Web. It is based on the HTTP protocol and has the following advantages: if multiple Downloaders concurrently download the same file, each downloader also uploads the file for other Downloaders at the same time, the file source supports a large number of users to download, but only increases the load. Because a large number of loads are evenly distributed to the entire system, the load of the machine providing the source files only increases in a small amount)
A bt file distribution system consists of the following entities:
A common Web Server
A static Metadata File
One Tracking Server
End user's web browser
Terminal downloader
Ideally, multiple end users are downloading the same file.
To provide file sharing, perform the following steps on a host:
Run a tracker server (or, a tracker server is already running)
Run a Web server, such as Apache, or a Web server is running.
Ø on the Web server, associate the file extension. torrent with the MIME type application/X-BitTorrent (or already associated)
Create a metadata file (. torrent) based on the URL of the tracker server and the file to be shared ).
Publish the "meta information" file to the Web server
Add a link to the "meta information" file on a Web page.
Ø run a downloader who already has the complete file (it is converted into 'origin', or 'seed', seed)
To start downloading files, the end user performs the following steps:
Installing Bt (or already installed)
Access the web server that provides. torrent files
Click the link to the. torrent file. A dialog box will pop up)
Where can I save the downloaded file? Or a resumable upload.
Wait until the download is complete.
Ends the running of the BT Program (if you do not stop it, BT will always provide file upload for others)
The connectivity between each part is as follows:
The website is responsible for providing a static file and placing the BT auxiliary program (client) on the client machine.
Trackers receives information from all download recipients and returns a random list of peers to them. This type of interaction is achieved through HTTP or HTTPS.
The downloader registers cyclically with the tracker so that the tracker can understand their progress. The downloader uploads and downloads data through a direct connection. This connection uses the BitTorrent peer protocol, which is based on TCP.
Origin is only responsible for uploading and never downloading, because it already has a complete file. Origin is required.
Meta files and tracker responses both adopt a simple, effective, and scalable format, called bencoding, which can contain strings and integers. This format is scalable because you can ignore unnecessary dictionary keywords. Other options can be easily added later.
The bencoding format is as follows:
For a string, first the length of a string, then the colon, followed by the actual string, for example: 4: Spam, is "spam"
The integer is encoded as follows, starting with 'I, then an integer in decimal format, and ending with 'e. For example, i3e represents 3 and I-3e represents-3. The integer has no size limit. The I-0e is invalid. Except i0e, All integers starting with 0 are invalid. Of course, i0e indicates 0.
The list encoding starts with 'l', followed by the List Value encoding (also using bencoded encoding), and ends with 'e. For example, L4: spam4: eggse indicates ['spam', 'egg'].
The dictionary encoding is as follows, starting with 'D', followed by the optional keys and their corresponding values, and ending with 'e' at the most. For example, D3: cow3: moo4: spam4: eggse indicates {'cow': 'moo', 'spam': 'egg'}, while D4: spaml1: Al: bee indicates {'spam': ['A', 'B']}. The key value must be a string and has been sorted (not in alphabetical order, but based on the original string ).
Meta files are dictionaries encoded with bencoded, including the following keywords:
Announce Tracker server
Info is actually a dictionary with the following keywords:
Name:
A string, used as a recommended value when saving the file. It's just a suggestion. You can save the file by another name.
Piece length:
For better transmission, files are separated into segments of the same length. This value is the size of the segment except for the last segment. The part size is almost always a power of 2, and the most commonly used is 256 K (the first version of Bt is 3.2, and 1 m is used as the default size)
Pieces:
A string of 20 integers. It is then separated into 20-byte long strings, and each substring is the hash value of the corresponding segment.
In addition, there is a keyword of length or files, and only one of these two keywords can appear. If it is length, only a single file is to be downloaded. If it is files, multiple files in a directory are to be downloaded.
For a single file, the length is the length of the file.
To support other keywords, when multiple files are stored, they are considered as one file, that is, the information of each file is connected according to the order in which the files appear, form a string. The information of each file is actually a dictionary, including the following keywords:
Length: file length
Path: List of subdirectory names. The last part of the list is the actual name of the file. (The list cannot be empty ).
Name: in the case of a single file, name is the name of the file, while in the case of multiple files, name is the name of the directory.
Tracker query. Trakcer receives information through the http get command parameters, and the response to the other party (that is, the downloader) is a bencoded message. Note that although the current tracker implementation requires a Web server, it can actually run lighter, for example, as a module of Apache.
Tracker GET requests have the following keys:
The GET request sent to the tracker contains the following keywords:
Info_hash:
Sha hash of info in the meta file, which is 20 bytes long. This character is almost certainly to be escaped (in a URL, some characters cannot appear and must be encoded using Unicode)
Peer_id:
The download ID, a 20-byte string. This ID must be created randomly before a new download starts. This string also needs to be escaped.
IP:
An optional parameter that specifies the peer IP address (or DNS name ?). It is usually used on origin, if it is on the same machine as tracker.
Port:
The port that the Peer listens. The downloader usually listens on port 6881. If the port is occupied, it will keep trying to port 6889. If all the ports are occupied, the listener will be abandoned.
Uploaded:
The size of uploaded data, in decimal format.
Downloaded:
Size of downloaded data, in decimal format
Left:
The amount of data that this peer has not been downloaded is in decimal format. Note: This value cannot be calculated based on the file length and size of downloaded data, because it may be a resumable data transfer. If you have to re-download the file because the file integrity check fails, this also provides an opportunity.
Event:
An optional keyword. The value is one of started, compted, or stopped (it can also be empty without processing ). If this keyword is not displayed ,. At the beginning of a download, this value is set to started. After the download is complete, it is set to completed. If the downloader stops the download, set this value to stopped.
The tracker response is a bencoded dictionary. If the tracker response has a keyword failure reason, it corresponds to a string to explain the cause of query failure. Other keywords are no longer needed. Otherwise, it must have two keywords: interval: The time interval between two sending requests by the downloader. Peers: a dictionary list. Each dictionary includes the following keywords: Peer ID, IP address, and port, which correspond to the ID, IP address, DNS name, and port number selected by peer respectively. NOTE: If some events occur or more peers are required, the downloader may occasionally send requests,
(Downloader sends a query request to the tracker through the http get command, and the tracker responds to a list of peers)
If you want to expand the metadata file or tracker query, You need to coordinate with Bram Cohen to ensure that all extensions are compatible.
The BT peer protocol is based on TCP, which is efficient and does not require any socket options. The BT peer-to-peer protocol refers to the protocol for information exchange between peer and peer)
The two peering connections are symmetric. messages are transmitted in the same direction, and data can flow in any direction.
Once a peer downloads a part and checks its integrity, it will announce to all its peers that it owns the part.
Either end of the connection contains two-bit status information: choked or interested. The choking is a notification to the recipient, and no data can be sent unless unchoking occurs. The reason for choking and the technology is explained later.
Once the status of one end changes to interested and the status of the other end changes to non-Choking, data transmission starts. (That is, if a peer wants to obtain data from a peer, it must first set the connection between the Peer to interested, which is actually sending a message, another peer should check whether it should send data to this guy. If it is unchoke for this guy, it can send data to it; otherwise, it still cannot send data to it) the interested status must be set all the time. You need some tips to achieve this, but it allows the Downloader to immediately know which peers will start to download.
The peering protocol starts with a handshake followed by a cyclic message stream. Each message is preceded by a number indicating the message length. In the handshake process, 19 is sent first, and then "BitTorrent protocol" is sent ". 19 is the length of BitTorrent protocol.
All subsequent integers are encoded into 4 bytes using big-Endian.
After the protocol name, it is 8 reserved bytes. These bytes are currently set to 0.
Next, we will calculate the hash value of the info information in the meta file through sha1, which is 20 bytes long. The recipient will also perform a hash operation on info. If the two results are different, it indicates that the object requested by the recipient is not provided by the recipient, so the connection is cut off.
Next is the 20-byte peer ID.
This is the handshake process.
The next step is the message stream starting with the message length, which is optional. Messages with a length of 0 are used to maintain the active state of the connection and are ignored. This message is usually sent every two minutes.
For other types of messages, there is a message type with a byte length. The possible values are as follows:
Messages of the 'choke', 'unchoe', 'interested', and not interested' type no longer contain other data.
'Bitfield' is always the first message to be sent. Its data is actually a bitmap. If downloader has sent a part, the corresponding position is 1; otherwise, it is set to 0. If Downloaders does not have a piece, you can ignore this message. (Through this message, what do you know ?)
The 'have 'type message is followed by a simple number, which is the index of the part that the downloader has just downloaded and checked for integrity. (From this, we can see that through this message, the peer will soon know each other about what fragments are available)
A message of the 'request' type, which is followed by an index, start position, and length. The length is a power of 2. The current implementation uses 215, and when the connection is closed, the length of a request exceeds 2 to 17. (This type of message is a request sent when a peer wants another peer to provide it with a piece)
The data of a 'cancel' message is the same as that of a 'request' message. They are usually sent only when the download tends to be completed, that is, in the 'end mode' phase. When a download is close to completion, it takes a long time for the last few parts to be downloaded. To ensure that the last few parts are downloaded as soon as possible, it sends a download request to all peers. To ensure this does not cause terrible inefficiency, once a part is downloaded, it sends the 'cancel' message to other peers. (That is to say, I don't want this piece. If you have prepared it, you don't have to send it to me. As you can imagine, if the other party sends the data, the duplicate data must be ignored ).
Messages of The 'piece 'type are followed by the index number, start position, and actual data. Note: There is a potential link between this type of message and the 'request' message (Note: A 'piece 'message will be returned only after a request message is received ). If choke and unchoke messages are sent too quickly, or the transmission speed is slow, some fragments may not be expected. (That is, sometimes some fragments are read, but these fragments are not what you want)