The hansh value of BT seeds is calculated. Recently, I am suddenly interested in BT seeds (do not ask why)
1. BT seeds (concept)
BT is a distributed file distribution protocol. Each file downloader continuously uploads downloaded data to other Downloaders while downloading. This ensures that the faster the download, the faster the upload, to implement notification download
2. How does BT download and upload files simultaneously?
Starting from the file itself, the BT file contains two parts of information, one is the tracker information and the other is the file information, tracker information mainly records the addresses of tracker servers required during the download process and settings for tracker servers. The file information is generated based on the calculation of the target file, the calculation result is encoded according to the B encoding rule (the English is not good, and this part of information comes from Baidu encyclopedia ). In the file information, the files to be downloaded are segmented, and the index information of each block is written to the torrent file. Upload a thunder task details page here.
You can see the specific task chunk information. Each downloader can upload the chunk data that has been downloaded by himself. How can I obtain the chunk information data that has been downloaded by other Downloaders, one method is implemented through the tracker server, which can be recorded by each Downloader. This means we often see that the BT sharing website on the LAN has a traffic statistics function for uploading and downloading, each person downloads and uploads to determine the contribution of each person.
This is why the Pirate Bay was previously sued by Sweden. The tracker server provides the possibility and opportunity for every downloader to download pirated files. Of course, many magnet protocols now use DHT technology, in this way, the existence of the tracker server is unnecessary. This is a further description!
3. BT hash value calculation (feature value calculation)
It can be seen from the above that every BT seed contains the file information of each part, which ensures the uniqueness of the BT seed even if the tracker server changes (here, insert a sentence. I thought that the uniqueness can be determined by taking the MD5 value of the BT seed, too naive). How can I calculate the hash value, this requires an in-depth understanding of the composition of the BT file. A considerable number of articles in this regard will be attached to the previous article:
Http://www.cnblogs.com/DxSoft/archive/2012/02/11/2346314.html
According to the description in this article, two test procedures are appreciated:
Method 1: Based on the field information after info, that is, the field after 4: info, we can intercept a section of BT seeds.
For example:
D8: announce27: http://tk3.5qzone.net: 8080/13: announce-listll27: 8080/el36: http://tk3.5qzone.net: 8000/announceel36: http://btfans.3322.org: 8080/announceel36: http://btfans.3322.org: 6969/announceel42: http://btfans.3322.org: 6969/announceel40: http://torrent-download.to: 5869/announceel45: http://tracker.openbittorrent.com: 80/announceel39: http://tracker.publicbt.com: 80/announceel4 0: http://tracker.bittorrent.am: 80/announceel30: http://tracker.prq.to/announceel34:http://tracker.prq.to/announce.phpel43:http://tracker.torrentbox.com:2710/announceel34:http://tpb.tracker.prq.to/announceel30:http://tr.wjl.cn:8080/announceel37:http://219.152.120.234:6969/announce el34: http://mdbt.3322.org: 6969/announceee7: comment19: Yyets Renren video resources site 13: comment. utf-826: maid? 0: created by13: bitcomet/0.7013: Creation datei1261532244e8: encoding3: gbk4: infod5: filesld6: lengthi943e4: Path
The first byte D represents the meaning of dict, which is composed of dictionaries. 4: info indicates that the length of the info field is 4, so that we can write down the code value for parsing hash:
#!pythonimport hashlibdef sha1sum(src): if not len(src): return "" m = hashlib.sha1(src) return m.hexdigest()#filename is the torrent file namewith open(filename) as f: torrent_data = f.read() if -1 != torrent_data.find("nodes"): info_data = torrent_data[torrent_data.find("info")+4:torrent_data.find("nodes")-2] else: info_data = torrent_data[torrent_data.find("info")+4:len(torrent_data) - 1] sha1_data = sha1sum(info_data) print "the hash data of torrent is: ", sha1_data.upper()
In actual tests, we found that in torrent, there are two cases: one is containing the nodes field information; the other is the torrent seed that does not contain the nodes information, which needs to be processed separately, however, this processing method is cumbersome, and it is unclear how many pitfalls are involved in the future. We need to deal with different situations. So we went to Google and mentioned a library in stackoverflow, this is the bencode library. This is our second solution.
Solution 2:
Use the bencode library to calculate the hash value (Detailed Description and interpretation of the bencode library). The Code is as follows (the bencode library needs to be installed). Address:
Https://pypi.python.org/pypi/BitTorrent-bencode/5.0.8.1
#!/usr/bin/pythonimport sys, os, hashlib, StringIOimport bencodedef main(): # Open torrent file torrent_file = open(sys.argv[1], "rb") metainfo = bencode.bdecode(torrent_file.read()) info = metainfo['info'] print hashlib.sha1(bencode.bencode(info)).hexdigest() if __name__ == "__main__": main()