Source code analysis to this part, in order to let everyone understand the encoding logic in the source code, we have to start from the beginning to sort out the complex connections and links within the program. Otherwise, you will be confused later. When I started to analyze the source code, without these macro concepts, I had to chew on the code, guess it hard, and then check the code for verification to understand the main framework logic of the program. If we talk about the source code in the speculative verification mode, it will become increasingly confusing. Therefore, in this section, we will try to describe all the connection types, link types, relationships between them, and associations in code. If you encounter an ambiguous part, you can refer to the Code for reference.
Before proceeding to the detailed analysis, we will provide the frame location map of the connection and link again. This simple hierarchy chart helps us understand the positional relationships between different connections and links. In fact, it was provided when we analyzed the or Connection source code. At that time, we didn't go into much detail.
Dir connection and listener connection |
----------------- |
AP connection, exit connection ...... Tor protocol upper layer |
-------------------------- | Application Layer
Circuit link ...... Tor protocol middle layer |
---------------------------- |
Or connection ...... Tor protocol lower layer |
------------------------------------
TLS connection Transport Layer
1. Connection
Obviously, there are various connection types in the system, each of which has different functions. Here, we will first list all the connections in the system, and then describe the connection types that we pay more attention.
// Or listener connection: It is used locally to listen for remote or requests, and creates an or connection for each new request. Or the listener connects to only one local connection; # define conn_type_or_listener 3 // or connection: Based on the TLS connection, it is mainly responsible for the communication between hosts in the TOR system. or the number of connections indicates the number of local hosts connected to each other, # define conn_type_or 4 // exit connection; # define conn_type_exit 5 // AP listener connection: A listener connection established locally to listen to service requests of local applications. An AP connection is established for each service request; there is only one local connection for the AP listener; # define conn_type_ap_listener 6 // AP connection: Link-based, mainly responsible for finding appropriate links for client requests and transmitting data; the number of AP connections represents the number of connection requests sent by local applications, because they correspond one to one; # define conn_type_ap 7 // dir listening connection: the TOR System Directory Server is used to listen to directory-related requests sent by the host in the TOR system. That is to say, this type of connection only exists on the directory server; # define conn_type_dir_listener 8 // dir connection: A new connection needs to be created when the TOR client sends a directory request to the TOR Directory Server. The connection needs to forward the request through an AP connection, that is, a request connection similar to a common application; # define conn_type_dir 9 // cpuworker connection: used to provide inter-process communication when the program starts the multi-process decryption service. For detailed procedures, see the cpuworker article analyzed earlier; normally not used; # define conn_type_cpuworker 10 // control listening connection: the local is used to listen to control commands or requests transmitted by local applications) # define conn_type_control_listener 11 // control connection: the local connection used to control message processing. (This is described later.) # define conn_type_control 12 // The following three connections are rare, here we will temporarily skip; # define conn_type_ap_trans_listener 13 # define conn_type_ap_natd_listener 14 # define conn_type_ap_dns_listener 15
Obviously, by analyzing the above connection types, we found that for clients, the most important connection types are Dir, AP, or. The Associations and functions of these connections help client applications send application requests to the TOR system, and then send data to the remote destination through the TOR system encapsulation and other operations. Next we will briefly describe the entire system operation process.
1. Send application requests
When an application wants to use the anonymous service provided by the TOR program, it must use the TOR client program to send data as a proxy. Here we use the simplest browser as an example. When the browser wants to browse the webpage anonymously, it needs to modify its proxy service configuration and send requests through the local proxy. Generally, we know that the proxy is set to IP: port. The selected IP address and port number are the IP address and port number of the listener application request written in the TOR program configuration file. In the configuration file, set their parameters to sockslistenaddress and socksport.
Obviously, the IP address and port number indicate that the TOR program will enable the listener connection to continuously listen on the port. The connection type of the listener is the AP listener connection.
If you observe carefully, we will also notice the configuration parameter controlport in the default configuration file. This parameter specifies the control port number of the ToR program listener. Similarly, we can know that the TOR program will enable the listener connection to listen to the requests received by the control port. The connection type of the listener is the control listener connection.
Based on the above description, we know why the AP listening connection and the control listening connection can only be one globally. Because they are nothing more than socket listening connections based on local addresses, each of them only needs to have one. We can also know that, just as the server receives a connection to obtain a socket, the TOR program will also obtain the socket connection corresponding to each listener request. The encapsulation of these connections forms AP connections and control connections.
2. Process application requests
The application request is sent through a local socket. If you are familiar with socket programming, you may not have any questions after the descriptions above. The problem now is how to forward data to and from the AP connection. In this case, if the AP connection has received the data sent by the application, it needs to send the data anonymously through the TOR system. Now, we have to ask how the TOR system transmits messages? Review Tor's thesis, Tor's technical documents, and Tor's introduction. We can recall the stream reuse and link reuse mechanisms of Tor systems. The related frameworks are as follows:
AP stream 1 <-->
AP stream 2 <--> circuit 1 <->
AP Stream 3 <-->
Circuit 2 <-> Local or connection <====> remote or connection
Circuit 3 <->
Multiple AP connections repeat the same link for the corresponding AP stream. Multiple links reuse the same or connection. Yes, the system is designed like this. However, what we want to know is what this design means for code and what it means for programmers. Let's start with the macro meaning represented by each part.
AP stream: the data stream formed after the AP connection receives the application request. The data flow can be bidirectional. That is to say, this is a data request of an application, such as a data request in a webpage on a browser. For programmers, this means that the AP connection requires a read/write buffer to temporarily store data and is ready to issue data at any time. It is important to establish a stream based on links.
Circuit: Tor system link. Generally, a link consists of three onion routes to form a private channel in the TOR network. The complete path of the entire channel is only known to the client, because all routing nodes in the link are selected by the client. The nodes in the link only know the frontend and successor nodes, and do not know any other nodes. This ensures the anonymity of the system. For programmers, the establishment of a link must select three link nodes, send a link establishment request to the first node, and then establish the entire Link by sending a link expansion request to the first node. The transmission of link requests is based on the establishment of or connections.
Or connection: a secure connection formed between two hosts in the TOR system, established on a TLS connection.
After the above three descriptions, we want to know more about the content of the Code corresponding to the three, how they interact with each other, and how they interact with data. Then, we need to use the system diagram that has already appeared in the front section. Now let's look back at the entire system diagram, and we should feel very open. The detailed description of the system block diagram is not expanded here, and the code details are too large. The content to be described in details is too complicated, so that I have no confidence to write it down. So I hope you can read the code a little bit.
However, our problem has not been completely clarified, so we will re-Describe the request processing process here. In fact, after the application requests are sent to the TOR program, they are saved in the buffer zone of the AP connection. The AP connection needs to find a suitable link to send data. Therefore, you need to select the best link in the existing link or create a new link. This is why similar code logic appears in the previous analysis. Here, we are more concerned about how to create a link without a link. In fact, before the link is established, the application request data stays within the buffer zone of the AP connection. During this period, the link is established. The first task of establishing a link is to select three nodes, then send or connection requests to the first node, and then establish the entire link from the top layer. With this basic idea, we can start code analysis for the basic process of link establishment.
2. Links
Before analyzing the code, we want to describe the link content in detail again. Because the link seems to be the core of the entire Tor system. Regardless of the choice of the link node, the encryption process of the link, the Protocol established by the link, and so on, we have not discussed in detail how the code is successfully implemented. Here, we only describe the link encryption process, and do not explain the node selection and protocol in depth. Because the node selection policy does not play a key role in the execution process of the program, we have already explained the content of the Protocol in detail in the previous handshake protocol section.
So how does the link implement onion-based encryption? This issue can also be stated as to how links can easily implement layer-by-layer encryption and layer-by-layer decryption operations? Here we will discuss the situation in different situations. First, we will discuss the situation where the link is the original link. That is to say, we will first discuss the link structure origin_circuit held by the client.
In the structure of the original link, we found a member variable: crypt_path_t * cpath. By analyzing this member variable, we find that it forms a two-way linked list. For clarity, we recommend the code of this struct here:
/** Holds accounting information for a single step in the layered encryption * stored med by a circuit. used only at the client edge of a circuit. * /// it can be seen from the annotation that this struct is used only by the link struct on the client, that is, it is used only by origin_circuit, at this point, we can already guess: maintain a two-way linked list of link nodes, store encryption and decryption, summarization, traffic control, and brief identity information related to all nodes; // the information mentioned above, you can find the corresponding item in the struct: typedef struct crypt_path_t {uint32_t magic; // Unique Identification Code;/* crypto environments * // move away from the client to backward, the client is a forward direction./** encr Yption key and counter for cells heading towards the or at this step. * // forward data encryption, including key and IV information; (no backward data encryption at the end node-forward) crypto_cipher_t * f_crypto; /** encryption key and counter for cells heading back from the or at this step. * // back-encryption data, including key and IV information; (the first node has no forward encryption data-backward) crypto_cipher_t * B _crypto; /** digest state for cells heading towards the or at this step. * /// forward summary, hash algorithm, and status; crypto_digest_t * f_di Gest;/* for integrity checking * // ** digest state for cells heading away from the or at this step. * // backward summary, hash algorithm, and status; crypto_digest_t * B _digest;/** current state of Diffie-Hellman key negotiation with the or at this step. * // The DH key negotiation status between the client and the node; crypto_dh_t * dh_handshake_state ;...... /** information to extend to the OR at this step. * // information about the node; extend_info_t * extend_info;/** is the circu It built to this step? Must be one of: *-cpath_state_closed (the circuit has not been extended to this step) *-cpath_state_awaiting_keys (we have sent an extend/create to this step * and not supported ed an extended/created) *-cpath_state_open (the circuit has been extended to this step) * // the state of the current node in the Link; uint8_t state; # define cpath_state_closed 0 # define cpath_state_awaiting_keys 1 # define cpath_state_open 2 // The node is in the Link The corresponding struct of the next node; struct crypt_path_t * Next;/** <link to next crypt_path_t in the circuit. * (the list is circular, so the last node * links to the First .) * // structure of the previous node of the node in the Link; struct crypt_path_t * Prev;/** <link to previous crypt_path_t in the * circuit. * // The client is used for traffic control and write control on the node; int package_window;/**
With the above header, the client's Tor program can easily organize links, control encryption and decryption, and abstract. The subsequent link operation code becomes easier to understand. This is the case of the original link. Next we will briefly describe the situation of the intermediate link. In fact, the structure of the intermediate link is or_circuit. The so-called intermediate link structure refers to the link structure used on the intermediate or end node of the link. This link structure does not need to know the complete information of the link. Instead, you only need to know the password environment and digest environment from the current node to the front and back nodes. In the end, the structure of the intermediate link only plays a role in forwarding and does not require too much information. All the things mentioned here can be seen in the Code. If you are interested, read the code and try it yourself.
Since then, you should be able to answer questions about detailed onion-based encryption and decryption on the link. After the link is established, all the crypt_path_t structures on the client are successfully filled, and the forward and backward keys shared with each node are also available. When a message is sent out, the client first encrypts the message to the key layer by using each node in the order of proximity and distance, and then sends the message. The sent message is decrypted to the key layer by layer, and is finally completely decrypted on the last node and sent to the link. When a message is sent back, each passing node uses the forward key to encrypt the message. After arriving at the client, the client decrypts the message to the key layer by layer in the order of proximity and distance, the final returned message. This process is also known as the onion encryption and decryption process.
After reading the code, you will not be confused about encryption/decryption. In the subsequent chapter, we will continue to analyze the code, and then talk about the client's execution process analysis.