BT source code learning experience (6): code analysis (initialization) of the Tracking Server (tracker)
Author: Wolfenstein
Tracker is an important part in BT. I noticed that this term was directly referenced in previous articles and was not translated. I thought about it and decided to translate it into a tracking server.
In BT download, the seed file indicates the information of the file to be downloaded and the message digest code for checking it. However, each peer, in the future, I will translate all the peer files into other clients to differentiate the client.) to obtain information from other peer clients, I still need to contact the tracing server. The tracking server does not store any files related to the content represented by the seed. It only records the IP addresses, ports, and other information of all machines that download the seed, in addition, when the customer requests it, it returns a list of such information. The actual content is the interaction between the peer customers.
The code of the Tracking Server is implemented in BitTorrent/track. py. In bttrack. py, it is just a simple line:
Track (argv [1:])
In this way, the parameter is uploaded to the track function of track. py. The track function is also relatively simple. It processes parameters and related configuration files, creates a rawserver, uses create_serversocket to create a server socket, and starts service. The last time I used network services in Bt is described in detail. I will not repeat it here. Only for the specific situation of the tracker function, analyze the situation after running to listen_forever. First, the tracker object is created and a port (config ['Port']) is opened. listening network service. The processing object of this function is an httphandler. Therefore, to analyze the process of a program, we only need to analyze the tracker initialization function to see what it has done after it is created, and then look at httphandler's actual analysis of its network protocol.
In the initialization function of the tracker object, we first initialize various variables. Then, you need to restore the State from a state file, that is, restore the state variable. The value in this variable is very important. We need to know its structure from some places. The reading and saving of the status file cannot obtain its information, because the implementation methods of these two fields are bencode and bdecode, we can only ensure that the structure of the State can be properly saved and restored, and we can see that the bencode encoding design is clever. But there is a function that is very helpful for us to analyze the internal structure of the State, that is, statefiletemplate. This function checks whether the value in the state is legal, so we can get some structure information of the State from here.
First, State must be a dictionary variable. Check the value of each item. If a keyword 'peers' is found, its value must also be a dictionary, which uses the message digest value of the information part of the seed file as the keyword dictionary, the Sha Digest algorithm meets the requirements of the digest algorithm, that is, the probability that different seed files generate the same digest is very small. In addition, because this is the digest value generated by the content of the seed file, even if you rename the seed file, you can still identify the seed file. Therefore, the 'peers' value can be regarded as the information recorded in each seed file. What information is recorded in each seed file? This information is a dictionary. This time, the ID of each peer customer is used as the keyword. Each peer customer generates an ID for himself when connecting to the tracking server, how to generate this ID can be seen later by the client code. What we know now is that its length must be 20. The value of this dictionary is, uh, a dictionary. However, this dictionary has much significance, including the number of IP addresses, the number of ports, and the number of remaining ports. Therefore, the State content can be viewed as follows: {'peers ':{},...}, the peers structure is as follows: {hash1: {id1: {'IP': XXX. xxx. xxx. XXX, 'Port': xxxx, left: XXXX}, Id2: {'IP': yyy. YYY. YYY. YYY, 'Port': yyyy, left: yyyy },...}, hash2 :{...},...}. The above is the 'peers' item in the State. The 'completed' item has a simple structure. It records the download completion of each seed file, and its structure is a dictionary, the message digest value of the information part of each seed is the keyword, and the corresponding value is an integer, indicating the number of people who have downloaded the seed file. Next is the 'allowed' item, which records all the seed information that the tracing server is concerned with, and still takes the message digest value in the information part as the keyword, the content is the actual information of the seed file, which is analyzed later (for BitTorrent/parsedir. PY analysis) you can know what the information is. In addition, we can also guess some of the information because we have already been clear about the internal structure of the seed file. There is also the 'allowed _ dir_files 'item in the State, which is also the dictionary that records the file information, but it is based on the file name of each file as the keyword (rather than the message digest value ), the project of each file is a list with the following structure: [(file modification time, file size), message digest value], that is, every value of this dictionary, which uses the file name as the keyword, is a list. The list has two elements. The first element is a binary group with the modification time and size of the file, the second element is the message digest value. Finally, we noticed that statefiletemplate has some additional check code when processing the 'allowed' and 'allowed _ dir_files 'items, that is, all the elements that appear in the 'allowed' items, the Digest value of the message must all appear in the 'allowed _ dir_files 'item, and the message digest part of the values in all items of 'allowed _ dir_files' must appear in 'allowed, in addition, duplicate message digest values are not allowed in 'allowed _ dir_files '(The 'allowed' item itself uses the message digest value as the keyword, And the dictionary keywords are already guaranteed not to be repeated ).
So now we know the structure of the attention part in the state. Pay attention to the following two sentences:
Self. Downloads = self. state. setdefault ('peers ',{})
Self. Completed = self. state. setdefault ('completed ',{})
In this way, the values of 'peers' and 'completed' in the state are passed to downloads and completed. More importantly, in the future, when the tracing server is running, if the values of 'peers' and 'completed' change (it is a certain value), the corresponding values in the State will also change. In this way, when saving dfile, you can update the state value in time. In the future, we will deal with them when analyzing and tracking the running process of the server. Now we can remember that downloads stores all the information of the downloaded client, completed stores statistics on download completion of all seeds.
The for loop below processes the NAT problem based on the configuration file and calculates the number of seeds. Completed only records the number of users who have completed the download, but only the number of users who have completed the download (Left = 0) is displayed in downloads (that is, the download is completed but the client is not closed) the client is a seed. Here we can easily see that seedcount is a dictionary for counting the number of seeds with the information digest as the keyword and integer as the value.
The following is a calculated variable. Times indicates the last active time of each customer (with the customer ID as the keyword) in each seed (with the information digest as the keyword. Next, we added two tasks, saving dfile at intervals, and checking whether the downloaded client has been unresponsive for a long time.
Next, prepare a log file and try to redirect the standard output to this log file.
Finally, we need to find all the seeds of interest to the tracing server, namely parsedir. This function can be viewed by ourselves, I believe that it is not difficult to analyze the encoding format of the seed file and the requirements of the items in the preceding state. In general, this function does the following to find all the objects in a directory. torrent files, read the information in these files, eliminate errors, repeat, and so on, and then process and output results that meet the requirements, stored in allowed and allowed_dir_files, this affects the State.
Now that the tracker object has been set up, it already has information about all the seeds for tracking and is ready to maintain the list of all connected customers, therefore, it can officially start providing the tracking service. Next time, we can see the effect of tracker moving.