BT source code learning experience (7): Code Analysis of Tracking Server (HTTP protocol processing object)
Author: Wolfenstein (neversaynever)
The last time we analyzed the initialization process of the tracker class, we started to look at how the tracking server provides services.
First, we analyze that the tracker processing object is httphandler, which is defined in BitTorrent/httphandler. py. The initialization function of this object is very simple, but assign the tracker. Get function to one of its internal variables for backup.
When an external network is connected, we know that httphandler. the external_connection_made function is called. It maintains a dictionary connections and uses the passed parameter connection (which is of the singlesocket type) as the keyword. The value is a new httpconnection, the newly created httpconnection mainly initializes some values. Note the following:
Self. next_func = self. read_type
This variable is directed to a function of its own. Later we will see that it will change to flexibly process different parts of the data.
Now you can analyze the network communication protocols between the client and the tracing server. Httphandler. data_came_in is called. We can see from its code that the main function is the data_came_in function of httpconnection corresponding to the network connection. It first checks the donereading mark and next_func function, that is, if you have completed the read operation or do not have next_func to process the next step, you will return directly, and then add the data (data read in the network) to your internal Buf, the following is a while loop. It can be seen that its approach is to find the/n value from the network data each time and use this value as two different processing units, then, assign the part before the carriage return to Val, and assign the latter part to Buf (equivalent to the part where the Buf is before the carriage return, and the rest is left for the next processing ), the Val is then processed by next_func, and the result is returned to next_func, which means that after the value is processed in next_func, it knows which function to process the following part, and then _ FUNC: Just redirect to it, and check whether you want to continue processing.
We can see that this function is well designed and can automatically divide different parts of a Protocol into different functions for processing. Even if the network is congested, only a part of data is generated, next time, if a part of data is integrated with Buf, next_func always points to the function that processes the next part of data.
From the initialization process of httpconnection, we know that the first part of the data processing function read_type first removes spaces and then separates them by space characters. If there are three words, the format is commandpath garbage. Otherwise, the format is command path. Then check whether the command must be get or head. Now we can guess that the path should be a URL path. At this point, we can see that the communication protocol between the client and the tracing server is actually HTTP. The next step is read_header to read the HTTP header. It first checks whether there is data. If there is data, it is very simple. It only maintains a dictionary headers and finds the keyword before ':', ':', followed by the value, then next_func or read_header, that is to say, the rest of the data is the header information of a row. After reading all the data, check whether there is an Accept-encoding item in the headers. This specifies the encoding method of the returned data. There are only two types of data, the normal mode ('Identity ') and the compression mode ('gzip '), and then call getfunc, which is actually a tracker. get to officially process the user's HTTP request, and has converted the request into a more convenient parameter, that is, the path (user's request URL) and headers information. After processing, if the returned result is not none, answer is called to return the processing result to the user.
Let's take a look at answer first. When we see its parameters, we know that it needs to convert the returned results to the HTTP protocol. The parameter passed to it is a tuple that contains four parts: response code, response string, header data, and body data. It first checks whether compression is required. If yes, It compresses the data, but after compression, it compares the compressed data with the previous data in length. If the compressed data is longer, then it will not be compressed. Next we will record the logs, for example, at a time of a month or a certain minute, someone requested something and returned some data. We have noticed that the standard output has been redirected to the log file during tracker initialization, so the print here is actually written to the log file. Then we use a stringio to process string operations, and we can constantly write it into it. We can see that the program follows the standard HTTP Response format ("HTTP 1.0 XXX responsestringblablabla .. /n ") format. After processing all the data, write it to the connection at one time and send it to the network. rawserver has helped us deal with problems such as network congestion, check whether the connection is closed if all data is written out. The same is true for the HTTP protocol. A request and a response are completed.
Now we can see that in BT, the communication protocol between the client and the tracing server is the HTTP protocol, and httphandler and httpconnection have all processed the details of HTTP, this means tracker. get has obtained a connection object, a user request address, and a dictionary-Type HTTP request header data. In addition, this function only needs to be fully processed, then, the processing result is included in the HTTP response code (200,404,500, etc.), and the response string (such as not found, which is combined with the previous Code as HTTP 1.0 404not found ), you can return the header data and text data in the HTTP response.
Next time, we can take a closer look at how tracker processes user requests.