bt 介紹以及 bt 種子的hash值(特徵值)計算

最後更新：2014-07-01 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：des blog http 檔案資料 2014

bt種子的hansh值計算，最近忽然對bt種子感興趣了（原因勿問）

1. bt種子（概念）

bt 是一個分布式檔案分發協議，每個檔案下載者在下載的同時向其它下載者不斷的上傳已經下載的資料，這樣保證下載越快，上傳越快，從而實現告訴下載

2. bt 如何?下載同時上傳檔案

這個需要從檔案本身說起，bt檔案包含了兩部分資訊，一部分是Tracker資訊，一部分是檔案資訊，tracker資訊主要是記錄下載過程中需要的tracker伺服器位址和針對tracker伺服器的設定，檔案資訊是根據對目標檔案的計算產生的，計算結果會以B編碼規則進行編碼(英文不太好，這部分資訊來自百度百科)。檔案資訊裡，會把需要下載的檔案進行分塊，每個塊的索引資訊會寫到torrent檔案中，在這裡上傳一個迅雷的任務詳細頁面

可以看到具體的任務分塊資訊，每個下載者都可以上傳自己已經下載的分塊資料，如何擷取其它下載者已經下載的分塊資訊資料呢，一種方式是通過tracker伺服器來實現的，可以記錄每個下載者，這也就是我們經常看見區域網路的bt分享網站會有對上傳下載的流量統計功能，每個人下載多少，上傳多少，從而確定每個人的貢獻值。

這也就是海盜灣之前被瑞典起訴的原因，tracker伺服器提供了給每個下載者下載盜版的可能性和機會，當然現在對於眾多的magnet協議，採用了dht技術，這樣對於tracker伺服器的存在就顯得沒有必要了，這是後話，慢慢在描述!

3. bt的hash值計算(特徵值計算)

由上可知，對於每一個bt種子，都有包含每個分塊的檔案資訊，這樣可以保證即使在tracker伺服器有變化的情況下，bt種子的唯一性(這裡，插一句，我曾經simple的以為，可以對bt種子取md5值就可以確定其唯一性，too naive啊)，那如何計算這個hash值呢，這需要對bt檔案組成的一個深入瞭解，這方面以後相當多的文章，附上一篇：

http://www.cnblogs.com/DxSoft/archive/2012/02/11/2346314.html

根據這篇文章的描述，謝了兩段測試程式：

方法1：直接根據info後的欄位資訊即4:info後的欄位，我們可以截取bt種子中的一段

例如：

d8:announce27:http://tk3.5qzone.net:8080/13:announce-listll27:http://tk3.5qzone.net:8080/el36:http://btfans.3322.org:8000/announceel36:http://btfans.3322.org:8080/announceel36:http://btfans.3322.org:6969/announceel42:http://denis.stalker.h3q.com:6969/announceel40:http://torrent-download.to:5869/announceel45:http://tracker.openbittorrent.com:80/announceel39:http://tracker.publicbt.com:80/announceel40:http://tracker.bittorrent.am:80/announceel30:http://tracker.prq.to/announceel34:http://tracker.prq.to/announce.phpel43:http://tracker.torrentbox.com:2710/announceel34:http://tpb.tracker.prq.to/announceel30:http://tr.wjl.cn:8080/announceel37:http://219.152.120.234:6969/announce el34:http://mdbt.3322.org:6969/announceee7:comment19:YYeTs人人影視資來源站點13:comment.utf-826:YYeTs浜轟漢褰辮璧勬簮絝?0:created by13:BitComet/0.7013:creation datei1261532244e8:encoding3:GBK4:infod5:filesld6:lengthi943e4:path

第一個位元組d代表dict意思，字典組成，4:info 代表著info欄位長度為4意思，這樣我們可以寫下解析hash的代碼值:

#!pythonimport hashlibdef sha1sum(src):     if not len(src):         return ""     m = hashlib.sha1(src)     return m.hexdigest()#filename is the torrent file namewith open(filename) as f:     torrent_data = f.read()     if -1 != torrent_data.find("nodes"):         info_data = torrent_data[torrent_data.find("info")+4:torrent_data.find("nodes")-2]     else:          info_data = torrent_data[torrent_data.find("info")+4:len(torrent_data) - 1]     sha1_data = sha1sum(info_data)     print "the hash data of torrent is: ", sha1_data.upper()

在實際測試時，發現，在torrent中，有兩種情況，一種是包含nodes欄位資訊的；一種是不包含nodes資訊的torrent種子，需要分別處理，但是這種處理的方式較為繁瑣，也不清楚後續到底有多少的坑在裡面，需要應對不同情況，於是去google了一下，在stackoverflow裡面，提到一個庫，即bencode庫，這就是我們的第二種方案

方案2：

通過bencode庫實現對hash值得計算(bencode庫後續詳細介紹和解讀)，代碼如下(需要安裝bencode庫)，地址：

https://pypi.python.org/pypi/BitTorrent-bencode/5.0.8.1

#!/usr/bin/pythonimport sys, os, hashlib, StringIOimport bencodedef main():    # Open torrent file    torrent_file = open(sys.argv[1], "rb")    metainfo = bencode.bdecode(torrent_file.read())    info = metainfo['info']    print hashlib.sha1(bencode.bencode(info)).hexdigest()    if __name__ == "__main__":    main()

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More