Using DHT network principle to make BT collecting spider

Source: Internet
Author: User
Tags sha1
The data structure in the torrent file is divided into the following parts: The effect can look at the specific 51 search display http://www.51bt.cc, combined with xunsearch full-text search technology, can achieve a millisecond data search


Announce:tracker's primary server


Announce-list:tracker Server List


Comment: Comments on seed files


Comment.utf-8: Utf-8 encoding for seed file annotations


Creation Date: The time that the seed file was established is the number of seconds from January 1, 1970 00:00:00 to the present.


Encoding: The default encoding for seed files, such as Gb2312,big5,utf-8


Info: All information about downloaded files is in this field, it includes multiple child fields, and depending on whether you're downloading a single file or multiple files, the child field items will be different.


When the seed contains more than one file, the Info field includes the following child fields:


Files: Represents the file name, size, which contains the following three child fields:


Lenghth: Size of file, in byte


Path: The name of the file that cannot be changed at download time


Path.utf-8: UTF-8 code for filename, ibid.


The above three fields have a set of values for each file.


Name: The recommended folder name, which can be changed at download time.


Name.utf-8: Recommended Utf-8 encoding for the folder name, ditto.


Piece Length: The size of each file block, calculated in byte


Pieces: File feature information, the field is larger, is actually the seed contains all the file segments of the SHA1 checksum value of the connection, will all the files according to the piece length of the byte size into blocks, each block to calculate a SHA1 value, These values are then concatenated to form the pieces field, which is always 20 times the size of an integer, because the SHA1 checksum value is 20Byte. This field is the largest portion of the torrent file, which can be seen if large files are small chunks, resulting in large torrent file size.


Publisher: Name of File publisher


Publisher.utf-8: Utf-8 encoding of the name of the file publisher


Publisher-url: Web site for publishers of documents


Publisher-url.utf-8: Utf-8 encoding of the file Publisher URL.


Also, when you publish a single file, the Files field is not, and the




Lenghth:
Name
Name.utf-8:


These three fields are responsible for describing the attributes of a single file: size, name, utf-8 encoding of the name. Other items are the same as multiple files.


The above items are all of the info field.


When it comes to info, I have to say Info_hash, this value is the hash value of the info field, 20 Byte, also uses SHA1 as the hash function. Since the Info field is made up of published file information, Info_hash is used to identify different seed files in the BT protocol. Basically, the info_hash of each seed file is different (at least no one has found the collision of Sha), so both the BT server and the client identify the different seed files with this value.


The scope of the calculation is to start with the Info field (not including the four bytes of "info") until the nodes field (not including the 5 bytes "nodes" and the two bytes of "5:" in front of the nodes representing the nodes field length). In addition, the Info_hash value is computed instantaneously and is not included in the torrent file.


Nodes: The last field is the Nodes field, which contains a list of IP and corresponding ports that are used to connect to the DHT initial node.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.