BitTorrent Learning
In a recent project, a server embedded in Android was built using open source nanohttpd. A part of the work that is being done recently is the WiFi download implementation based on the BitTorrent protocol.
Protocol Introduction
Normal http/ftp download using the TCP/IP protocol, the BitTorrent protocol is a peer-to file transfer protocol that is architected on the TCP/IP protocol and is in the application layer of the TCP/IP architecture. The BitTorrent protocol itself also contains a number of specific content protocols and extension protocols, and is constantly expanding.
According to the BitTorrent protocol, a file publisher will provide a. torrent file, which is a seed file, or "seed", based on the file to be published.
A seed file is essentially a text file that contains tracker information and file information in two parts.
Tracker information is mainly used in the BT download tracker server address and settings for the tracker server, the file information is based on the calculation of the target file generated.
The main principle of the file information is to provide the downloaded file virtual into the equal size of the block , the block size must be 2k of the whole number of square (because it is a virtual block, the hard disk does not produce individual block files), and the index information of each block and hash verification code into the seed file; The seed file is the "index" of the downloaded file. To download the contents of the file, the download needs to get the appropriate seed file first.
When downloading, the BT client first parses the seed file to get the tracker address and then connects to the tracker server. The tracker server responds to the request of the downloader, providing the IP of the other downloader (including the publisher) of the Downloader. The downloader then connects the other downloader, according to the seed file, they tell each other their own blocks, and then exchange the data they do not have.
Each download by a block, you need to calculate the download block hash verification code and the comparison of the seed file, if the same block is correct, not the same will need to re-download the block. This provision is intended to address the issue of the accuracy of download content.
Tracker Server
The tracker server is a required role in the BT download. A btclient in the beginning of the download and the process of downloading, you have to constantly communicate with the tracker server to report its own information, and to obtain additional information to download the client.
Procedures for Tracker servers
- The client sends an HTTP GET request to tracker and puts its own information in the get parameter; The general meaning of this request is: I am XXX (a unique ID), I want to download yyy file, my IP is AAA, I use the port is BBB ...
- Tracker maintains information about all downloads, and when it receives a request, it first records the other's information (if it is already on record, check if it needs to be updated), then part (not all, Depending on the parameters set for the download, the download of the same file (a tracker server may simultaneously maintain the download of multiple files) is returned to the other user.
- After receiving the response from the tracker, the client can obtain information from the other downloader, and it can then establish a connection with the other downloader to download the file fragment from them.
BitTorrent protocol Execution Process
According to the BitTorrent protocol, a file publisher will provide a seed file based on the file to be published. To download the contents of the file, you need to obtain the appropriate seed file, and then use the BT client software to download.
When downloading, the **BT client first parses the seed file to get the tracker address and then connects to the tracker server.
Each download by a block, you need to calculate the download block hash verification code and the comparison of the seed file, if the same block is correct, not the same will need to re-download the block. This provision is intended to address the issue of the accuracy of download content. **
Bencode
Bencode (pronounced Bee-encode) is the encoding used by BitTorrent in transmitting data structures. Supports four encoding methods:
- String
- Integer
- String column
- Dictionary table
Bencode Coding Rules
- An integer number is encoded in decimal and enclosed between I and E, leading 0 is not allowed (but 0 is still an integer 0), negative numbers are preceded by a leading minus sign, and negative 0 is not allowed. If the integer "42" is encoded as "i42e", the number "0" is encoded as "i0e" and "42" is encoded as "i-42e".
- A byte string (just a byte string, not necessarily a characters) will be encoded in (length): (content) encoding, the length of the value is the same as the number encoding method, just do not allow negative numbers, the content is the contents of the string, such as the string "spam" will be encoded as "4:spam"
Linear tables are encoded in L and E, where the contents of the encoded string consisting of Bencode four encoding formats, such as the linear table containing the string "spam" number "42", are encoded as "L4:SPAMI42EE", noting that the delimiters correspond to pairs.
Analysis:
The string spam encoded after the 4:SPAM, 42 encoded as i42e, the deformation table with L and E in both of them, that is L 4:spam i42e e
The dictionary table is encoded in D and E, and the keys and values of the dictionary elements must be followed together, and all keys are string types and sorted in dictionary order. Dictionary tables that have the value "bar" as the string "spam" and the Key "Foo" value as an integer "42" are encoded as "D3:BAR4:SPAM3:FOOI42EE".
Analysis: D 3:bar 4:spam 3:foo i42e E. D Key-value key-value E
Seed file
The torrent seed file is essentially a text file that contains tracker information and file information in two parts. Tracker information is mainly used in the BT download tracker server address and settings for tracker server, the file information is based on the calculation of the target file generated, the results of the calculation according to the BitTorrent protocol Bencode rules encoded. Its main principle is to provide the downloaded file virtual into equal size blocks, the block size must be 2k of the whole number of square (because it is a virtual block, the hard disk does not produce individual block files), and the index information of each block and the hash code is written to the seed file; So, the seed file is the "index" of the downloaded file.
The seed file contains the following data:
- URL of announce Tracker
- Info the bar maps to a dictionary, and the key of the dictionary depends on the file or files that are shared:
- Name suggested file and directory names to save to
- Piece length The number of bytes per file block. Usually 2^{8} = 256KB = 262144B
- Pieces the integrated hash of the SHA-1 of each file block. Because SHA-1 will return a hash of 160-bit, pieces will get a string of 1 160-bit integer times. and a length (equivalent to just one file being shared) or files (equivalent to when multiple files are shared):
- The size of the length file, in bytes
-files a list of dictionaries (one file per dictionary) with the following keys:
- Path a list of strings corresponding to the subdirectory name, and the last entry is the actual file name
- The size of the length file, in bytes
Open Source Project Learning
To learn more about the implementation of the BitTorrent protocol, start researching on GitHub's Open source project Ttorrent, the GitHub address is https://github.com/mpetazzoni/ttorrent
The Open source project is divided into the following packages:
- Bcodec: This part mainly implements the Bencode and Bdecode of the above mentioned seed file Encoding section, which is used to analyze the. torrent file and obtain the desired information from it.
- Common.protocol: This part is the implementation of the Protocol.
- Client package for clients that implement the BT protocol.
- Tacker package, used to implement the tracker server section.
Bcoded encoding and decoding package
This part of the implementation conforms to the code and decoding rules described above, which is not studied in depth.
Protocol Protocol Package
The peer class is similar to bean data and stores the node information in the BT network. Relatively simple here skip.
The Torrent class is the implementation of the BitTorrent protocol. Mainly divided into several parts, with several methods of analysis:
. torrent File Parsing section
public static Torrent load (File Torrent, Boolean seeder)
Throws IOException, NoSuchAlgorithmException {
byte[] data = Fileutils.readfiletobytearray (torrent);
return new Torrent (data, seeder);
}
The above section reads from the. torrent file into the binary cache. It is then fed into the torrent construction method to process:
The method is long, take the key part analysis:
Get the binary data first Bdecode decode, turn into a dictionary table, and then take out information in info to encrypt, then hash out the signature, used to check the integrity of the following:
this.decoded = BDecoder.bdecode( new ByteArrayInputStream(this.encoded)).getMap(); this.decoded_info = this.decoded.get("info").getMap(); ByteArrayOutputStream baos = new ByteArrayOutputStream(); BEncoder.bencode(this.decoded_info, baos); this.encoded_info = baos.toByteArray(); this.info_hash = Torrent.hash(this.encoded_info); this.hex_info_hash = Torrent.byteArrayToHexString(this.info_hash);
In the obtained torrent file, there is bep0012 multitracker Metadata extension mode, which resolves to get multiple tracker.
if (this.decoded.containsKey("announce-list")) { ...处理解析多个trackers}
If it is a single tracker, analyze the contents of the announce field, such as the contents of the following torrent file:
{
' Announce ': ' Http://bttracker.debian.org:6969/announce ',
' Info ':
{
' Name ': ' Debian-503-amd64-cd-1.iso ',
' Piece length ': 262144,
' Length ': 678301696,
' Pieces ': ' 841ae846bc5b6d7bd6e9aa3dd9e551559c82abc1...d14f1631d776008f83772ee170c42411618190a4 '
}
}
In the code snippet above, info about the download file information is taken out. and decodes it, then takes out the tracker address in the announce and adds it.
else if (this.decoded.containsKey("announce")) { URI tracker = new URI(this.decoded.get("announce").getString()); this.allTrackers.add(tracker); // Build a single-tier announce list. List<URI> tier = new ArrayList<URI>(); tier.add(tracker); this.trackers.add(tier); }
The information is then taken out and stored in the properties of the Torrent class.
Then analyze the files we want to download, the above torrent information download single file, no files fields. Single File download name is the file name stored in the filename, length is the size of files.
public static class TorrentFile { public final File file; public final long size; public TorrentFile(File file, long size) { this.file = file; this.size = size; }}
The inner class torrentfile is used to store information about files downloaded in a seed file.
this.files.add(new TorrentFile( new File(this.name), this.decoded_info.get("length").getLong()));
Store the file's information in our files list and use it to retrieve the download later.
Create a Seed file section
The Create method has multiple overloads that take one of the analyses:
public static Torrent create(File source, URI announce, String createdBy) throws InterruptedException, IOException, NoSuchAlgorithmException { return Torrent.create(source, null, DEFAULT_PIECE_LENGTH, announce, null, createdBy);}
Where source is the file field in the seed files, Annouce is the address of the tracker to be used, and CreateBy is the name of the seed issuer. The Create method is called as follows:
private static Torrent create(File parent, List<File> files, int pieceLength, URI announce, List<List<URI>> announceList, String createdBy) throws InterruptedException, IOException, NoSuchAlgorithmException{ ..... }
According to the parameters we have given, the code is populated with the various fields of the seed file, the process of constructing the data structure of the. Torrent. For example, the following code snippet:
torrent.put("creation date", new BEValue(new Date().getTime() / 1000)); torrent.put("created by", new BEValue(createdBy)); Map<String, BEValue> info = new TreeMap<String, BEValue>(); info.put("name", new BEValue(parent.getName())); info.put("piece length", new BEValue(pieceLength));
Information about fields such as name, peice length, and so on are filled in. The final encoding is then reborn into a torrent class.
ByteArrayOutputStream baos = new ByteArrayOutputStream(); BEncoder.bencode(new BEValue(torrent), baos); return new Torrent(baos.toByteArray(), true);
Finally, the class is serialized to a. torrent file by calling the Save method.
public void Save (OutputStream output) throws IOException {
Output.write (this.getencoded ());
}
Coding section
In addition to the above two sections, the Torrent class also has an encoding section. such as hash coding, bencode encoding, etc., such as the following data block hash encoding method:
private static class CallableChunkHasher implements Callable<String> { private final MessageDigest md; private final ByteBuffer data; CallableChunkHasher(ByteBuffer buffer) throws NoSuchAlgorithmException { this.md = MessageDigest.getInstance("SHA-1"); this.data = ByteBuffer.allocate(buffer.remaining()); buffer.mark(); this.data.put(buffer); this.data.clear(); buffer.reset(); } @Override public String call() throws UnsupportedEncodingException { this.md.reset(); this.md.update(this.data.array()); return new String(md.digest(), Torrent.BYTE_ENCODING); }}
Here, the Java callable and the future operation, that is, the new thread of the individual blocks of data hashed encryption, to prevent blocking. The result is returned when the calculation is complete. When called later, the call can be made via Executorservice:
ExecutorService executor = Executors.newCachedThreadPool(); Task task = new Task(); Future<Integer> result = executor.submit(task); executor.shutdown();
The above task Class hash task class Callablechunkhasher. In this way, when calculating each small area block of a file, a small block is computed to obtain a hash value of Sha-1 160 and then into the pieces field. The method is long and the key parts are analyzed:
int threads = getHashingThreadsCount(); ExecutorService executor = Executors.newFixedThreadPool(threads);
First gets the number of threads performing the hash calculation, and then declares a executorservice that is used to manage the dispatch thread task. Then make some initialization declarations, such as:
ByteBuffer buffer = ByteBuffer.allocate(pieceLenght); List<Future<String>> results = new LinkedList<Future<String>>(); StringBuilder hashes = new StringBuilder();
Where buffer is used to store the computed pieces hash value, results is the future class, and when a task is submitted through Executorservice, the returned future object can be used to track the health of the thread so that the results of the calculation can be obtained. StringBuilder is used to stitch the computed hash into the last pieces string.
FileInputStream fis = new FileInputStream(file); FileChannel channel = fis.getChannel(); int step = 10;
Read files,
while (channel.read(buffer) > 0) { if (buffer.remaining() == 0) { buffer.clear(); results.add(executor.submit(new CallableChunkHasher(buffer))); } }
Reads the file into the cache, opens a thread to compute the hash value when the cache is full, and adds the returned future object. The detailed implementation here does not analyze, only know this part through multithreading to achieve efficient computation, and finally return pieces value, that is, the results of each block sha-1.
At this point, the protocol part of the analysis completed, but also left the client package and tracker server-side tracker package.
Tracker Bag Learning Tracker class
The Tracker class is the implementation of the tracker server in the BitTorrent protocol. The tracker server is the server that helps the BitTorrent protocol connect between nodes and nodes.
The BitTorrent client download begins with a connection to tracker, and the other client IP address is obtained from tracker before connecting to other client downloads. In the transmission process, will also continue to communicate with tracker, upload their own information, to obtain information from other clients.
Implementation, the default port is 6969 and is also the default implementation port for the BitTorrent protocol.
There are mainly two threads in the Tracker class, one tracker thread, and one collector thread.
When you execute the Start method, two new threads are opened in turn.
Main execution in Tracker threads:
connection.connect(address);
The connection is a Simpleframework method that listens for the address addressed to the specified addresses.
Collector Thread, the loop executes when tracker is running
for (TrackedTorrent torrent : torrents.values()) { torrent.collectUnfreshPeers(); } public void collectUnfreshPeers() { for (TrackedPeer peer : this.peers.values()) { if (!peer.isFresh()) { this.peers.remove(peer.getHexPeerId()); } }}
You can see that you are traversing the torrent that has been traced on the tracker and find the peer that is not active and remove it from the list.
public synchronized TrackedTorrent announce(TrackedTorrent torrent) { TrackedTorrent existing = this.torrents.get(torrent.getHexInfoHash()); if (existing != null) { logger.warn("Tracker already announced torrent for ‘{}‘ " + "with hash {}.", existing.getName(), existing.getHexInfoHash()); return existing; } this.torrents.put(torrent.getHexInfoHash(), torrent); logger.info("Registered new torrent for ‘{}‘ with hash {}.", torrent.getName(), torrent.getHexInfoHash()); return torrent;}
In the announce method, a torrent is given, based on the computed hash value, if not present in the tracker, and if present, returns the torrent.
public synchronized void remove(Torrent torrent) { if (torrent == null) { return; } this.torrents.remove(torrent.getHexInfoHash());}
The Remove method removes the seed file that has been published on Tracker.
Trackedtorrent class
The
Trackedtorrent primarily maintains download information for torrent that is already on tracker, that is, the management of the number, status, and so on, of the client that this torrent file is involved in downloading.
Update method:
Public trackedpeer Update (requestevent event, Bytebuffer Peerid, String hexpeerid, string ip, int port, long uploaded, Long downloaded, long left) throws unsupportedencodingexception {Trackedpeer peer; Trackedpeer.peerstate state = TrackedPeer.PeerState.UNKNOWN; if (RequestEvent.STARTED.equals (event)) {peer = new Trackedpeer (this, IP, port, Peerid); state = TrackedPeer.PeerState.STARTED; This.addpeer (peer); } else if (RequestEvent.STOPPED.equals (event)) {peer = This.removepeer (Hexpeerid); state = TrackedPeer.PeerState.STOPPED; } else if (RequestEvent.COMPLETED.equals (event)) {peer = This.getpeer (Hexpeerid); state = TrackedPeer.PeerState.COMPLETED; } else if (RequestEvent.NONE.equals (event)) {peer = This.getpeer (Hexpeerid); state = TrackedPeer.PeerState.STARTED; } else {throw new IllegalArgumentException ("Unexpected announce event type!"); } peer.update (state, uploaded, DownloadeD, left); return peer;}
The event entered is an enumeration used to update the new state of the node peer. Started, create a new peer node and fill in the current torrent maintained peers, the rest of the situation directly from the existing peers to get peer and update the relevant state.
Getsomepeers Method:
Select the appropriate peer from the candidate peers to return as a Answerpeer
Trackedpeer class
A class that makes data exchange, participates in active node Client-peer, and maintains the state of nodes in a trackedtorrent on tracker.
Trackerservice class
Reference: Https://wiki.theory.org/BitTorrentSpecification
This class mainly does the response work to handle the client request, which is used to process the tracker HTTP protocol request. The request information sent by the client helps tracker understand the seed information and responds to help the client understand the information of other users involved in the download.
Clients carry Peerid, upload downloads and other information, the server according to this update has been annouce seed information.
peer = torrent.update(event, ByteBuffer.wrap(announceRequest.getPeerId()), announceRequest.getHexPeerId(), announceRequest.getIp(), announceRequest.getPort(), announceRequest.getUploaded(), announceRequest.getDownloaded(), announceRequest.getLeft());
The server returns tracker ID, peers and other information to help the client to download:
announceResponse = HTTPAnnounceResponseMessage.craft( torrent.getAnnounceInterval(), TrackedTorrent.MIN_ANNOUNCE_INTERVAL_SECONDS, this.version, torrent.seeders(), torrent.leechers(), torrent.getSomePeers(peer)); WritableByteChannel channel = Channels.newChannel(body); channel.write(announceResponse.getData());
The general idea is this, specific content, such as the URL to parse the corresponding format, but also refer to the project code.
Summarize
BitTorrent protocol server-side work mainly as above, the server and client side through HTTP communication, the server mainly maintain the seed information, and help the client to select the object to download files.
Client through the TCP protocol to achieve file download, the implementation of part of the Code analysis in the final package client.
Java implementation analysis of BitTorrent protocol