EMule source code parsing (5)

Last Update:2018-12-04 Source: Internet

Author: User

Tags network function

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Description of the kademlia code in eMule

When the kademlia network is used in the eMule, the central server will no longer become invalid because there is no central server in the network, or all users
It is a server, and all users are also clients, thus completely implementing P2P. Next, we will analyze the kademlia network in the eMule. We will analyze the principles in this section.
The other sections will be described based on the different classes used to implement kademlia in eMule. Where:

Ckademlia is the master class of the whole kademlia network. It can start or stop the kademlia network directly and contain the process method to process daily transactions.

Cprefs is responsible for processing its own kademlia-related information, such as its own ID.

Croutingzone, croutingbin, and CContact are composed of three classes.Contact information for each nodeAndData structure composed of these contact information.

Ckademliaudplistener processes network information.

Cindexed is responsible for processing the index information stored locally.

Csearch and csearchmanager process search-related operations. The former indicates a single search task, and the latter processes all search tasks.

Cuint128 processes a 128-bit long integer and has built-in operations. As mentioned above.

Basic Principles of kademlia in eMule

Kademlia is a structured overlay
Network), the so-called Coverage Network is a virtual network built again on the physical Internet, all participating nodes know the IP addresses of some other nodes, these nodes
It is called its neighbor. If you need to find something, it first looks for it locally. If you cannot find it, it forwards the query to its neighbor, you may find the corresponding results. The covered network is divided
The difference between structured and unstructured data is that each node knows which other node information has specific rules. In an unstructured Coverage Network, the neighbor condition of each node is not specified.
Law. Therefore, in a non-structured network, if you want to perform a query, a method called flooding will be adopted. If each node does not find the desired result locally, will forward the search request
To its neighbors, and then perform a step-by-step search by neighbors. However, if this method is not processed properly, the message load of the entire network is too large. There are already a lot of articles about non-final optimization
Queries in structured coverage networks are deeply explored.

For a structured Coverage Network, each node has a certain rule on which nodes it will select as neighbors, so that when searching, when a node forwards a search request
Select the shard nodes to which requests are forwarded. This can also reduce the search cost. A structured Coverage Network usually requires each node to generate a random ID to determine
Link. This ID must be independent from the physical network in which it is located.

For the kademlia network, this ID is a 128-bit value. All nodes use this ID to measure their logical distance from other nodes. The Calculation Method of logical distance is
The node performs an exclusive or (XOR) operation. During the formation of the kademlia network, the principle of selecting a neighbor for each node is that the node closer to its own logical distance is more likely to be added to its own shard node column.
In the table, whether to add a node to its neighbor node list is processed based on the distance. The code of a specific program will be described later.

The advantage of a structured network is that if we want to find a node that is close enough to the logic of a specific ID, we can ensure the number of hops at the O (logn) level. You only need to first find the target that you know.
The ID logic is close to a node that is disconnected enough, and then ask whether it knows it is closer, and then proceed like this. Therefore, this is also true when searching. When resources need to be published, the file is hashed so that
It can calculate a 128-bit ID or hash the keywords. Then find the node closest to the result logic and send the file or keyword information to it,
Store it. When someone wants to search for the same thing, because it uses the same hash algorithm, it can calculate the corresponding ID, and search for nodes that are close to the logic of this ID,
Because it knows that if the network actually has these resources, these nodes are most likely to know the information. From this we can see that the resource search efficiency of a structured network is very high, but it is non-structured.
The disadvantage is that you cannot perform complex queries, that is, you can only search by simple keywords or file hash values. Searches for unstructured networks are forwarded randomly.
The nodes in the query request have a clear understanding of local resources, so they can naturally support complex queries, however, obviously, complex queries supported by unstructured networks are unlikely to mobilize all nodes for this action. Contents
There is no way to combine the advantages of the two kinds of coverage networks before, and I would like to know such a method.

Basic metrics class of kademlia in eMule

The main control class of kademlia is ckademlia, which is responsible for starting and disabling Code related to the whole kademlia network. In its process function
For example, check whether the number of nodes in a certain interval is too small. If yes, find new nodes. In addition, you often check your neighbors.
Routine work. The daily processing of all search tasks also requires scheduling. It also serves as a representative of the kademlia network and returns some statistics of the kademlia network to other part of the code of the eMule.

Cprefs is another basic struct class, which is similar to cpreferences in general eMule Code. However, cprefs only retains local information related to the kademlia network and needs to be stored for a long time. In this version, the local ID is used.

Another important infrastructure is cuint128, which implements various processing for 128-bit IDs, as mentioned in the previous sections.

Contact List Management for kademlia in eMule

Croutingzone, croutingbin, and CContact constitute the contact list data structure. It must meet our search requirements, that is, the time to search for the target must be acceptable, and the occupied space must be acceptable.

First, the CContact class containsOne contactInformation,
It mainly includes the IP address, ID, TCP port, UDP port, Kad version number, and its health level (m_bytype) of the peer ). The health level is 0-4. Contact you just joined
The person, that is, the health status is unknown. The value is set to 3. The system regularly checks the health status of each contact, and can often contact the contact.
To 0. However, if the contact is not reached, the value will gradually increase. If the contact fails to be reached after 4, it will be deleted from the contact list.

The croutingbin class contains a list of ccontacts (typedef STD: List <CContact *> contactlist ;). Note that you must use a croutingbin or croutingzone to access the contact information.
Internal Contact information is not directly contained. You can add new contact information to a specific croutingbin. You can also search for contacts. It also provides a way to find
ID is the closest contact and a list is provided. This is very important. Finally, the number of ccontacts that can be included in a croutingbin class is also limited. (# Define K 10 is defined in the namespace of kademila)

The croutingzone class is at the top of the contact data structure and provides operation interfaces for the kademlia network. The structure of this class is a binary tree with two croutingzones pointing to its left and right subtree. It also contains a croutingbin pointer. However, this pointer to the croutingbin type makes sense only when the current croutingzone class is the leaf node of the entire binary tree. (Croutingzone * m_psubzones [2]; croutingzone * m_psuperzone ;)
A binary tree is characterized by that the IDs of all contacts under each node contain a common prefix. The deeper the layers of nodes, the longer the common prefix. For example, the ID of all nodes in the left subtree of the root node must be
There is a prefix "0", and all nodes in the right subtree must have a prefix "1 ". Similarly, the IDs of all nodes under the right subtree of the Left subtree of the root node must have the prefix "01", and so on. Our ideas
The process that the node constantly needs to add to this binary tree. At the beginning, there was only one root node, which is also a leaf node. At this time, its internal croutingbin is meaningful. When the contact information is constantly added
After the addition, the capacity of the croutingbin is full. In this case, a split operation is required. At this time, two left and right child nodes will be added, and
The contact information in croutingbin is copied to the left node and the right node based on their prefix characteristics, and then the croutingbin is abolished. This completes the split process.
. After the split, the system tries to add the contact information again. In this case, it tries to add the contact information to the corresponding subtree according to its ID. However, not all nodes are split in this case, because
If any split is allowed, the number of node information to be stored locally increases dramatically. Here, the role of its own ID is reflected. This node is split only when its ID and the node to be split have a common prefix. If it is determined that a node cannot be split and Its croutingbin is full, the contact information is denied.

We can see that, under the above policy, the closer the logical distance from the ID (that is, the longer the common prefix), the more likely the contact information will be added, because the node corresponding to it is more likely to be obtained by splitting
More sub-nodes correspond to more capacity. In this way, in the kademlia network, the ratio of participants closer to their own logical distance is higher for each participant to know other participant information. Because
When searching, you only need to constantly search for closer IDs, and each step will make progress. Therefore, the time required to find the target ID is O (logn ), from the Binary Tree Structure
We can also see that, because only some nodes are split, the space required for storage is O (logn ).

In fact, croutingzone has some differences with the theory of kademlia. For example, there is a minimum split layer starting from the root node. That is to say, if the number of layers is too low, it is always allowed to split, in this way, it can know a little more contact information in other regions.

Kafemlia network message processing in eMule

Ckademliaudplistener processes all messages related to the kademlia network. We have already made a rough description of the basic situation of the eMule communication protocol. We can see that the messages processed by ckademliaudplistener must be only related to the kademlia network, and the sorting work is already in progress.EMule's common UDP client processing codeIt has been processed. The specific message format is described earlier. The following describes the specific message types.

The first is the Health Check message, which is a general ping-pong mechanism. The corresponding messages include kademlia_hello_req and
Kademlia_hello_res. When you check the local contact information list, the system sends the kademlia_hello_req message to them and processes the received
Kademlia_hello_res message.

The most commonly used message is the node search message. In the kademlia network, the node search is the main message to be transmitted by daily applications. Its implementation method is iterative search. In this way
When you start searching for an ID, find the nearest contact in the local contact information list and send a search request to them. In this way, you can obtain the closer contact information, and then
Send a search request. By constantly performing such search queries, you can obtain the contact information closest to the target ID. The message codes are kademlia_req and kademlia_res. (The two message codes are used to update the route table)

The next step is to publish or search the content. This can be better understood by combining the analysis of the cindexed class. EMule stores three types of information in the kademlia Network: file source, keyword information, and file comments. The file source corresponds to each specific file, and each file usesIts content Hash ValueAs the unique identifier of the file,
A file source information is such a fact that someone owns a specific file. A keyword indicates the fact that the keyword corresponds to a file. Obviously, a keyword may
Multiple files are needed, and there may be more than one file source for a specific file. However, their indexes are all based on fixed hash algorithms, making search and publishing very simple.

Let's look at the release process. Each eMule client has clearly understood the details of its shared files. In the traditional scenario of a central Indexing Server, it uploads all its file information to the center.
Index Server. However, in the kademlia network, it needs to be dispersed. The first thing it does is to split the file name into words, that is, extract the keywords one by one from the file name, the word splitting method is very simple, that is, to find characters with delimiters in the file name, such as underscores, and then cut the file name. After calculating the hash value of these keywords, it publishes the keyword information to the corresponding contact. And publish the file information to the contact whose hash value is close to the file content. The corresponding messages are kademlia_publish_req and kademlia_publish_res (these two message codes are used to publish shared files ). In addition, eMule allows users to comment on a file. The comment information is stored separately, but the principle is the same.

When users use the kademlia network to search and download files, they first search for a keyword. Because the same hash algorithm is used, they only need to find the id value and
After the calculated hash value has similar contact information, it can directly send a request to search for specific keywords. If the returned information is obtained, the searcher will know that this keyword corresponds to more
And list all the file information. When the user decides to download an object, the search process for this specific object starts. If this search is successful, the returned text is
File source information. In this way, eMule then needs to connect to the corresponding address according to the information, and uses the traditional eMule protocol to negotiate with them to download files. The corresponding messages here are kademlia_search_req and kademlia_search_res (these two message codes are used to search for files ).

The actual implementation of the kafemlia2 Protocol has the same principle, except that the protocol code is different from the specific message format, such as kademlia_req and
Kademlia_res corresponds to kademlia2_hello_req and kademlia2_hello_res, but the latter includes
Rich information. In implementation, 0.47c is more inclined to use kademlia2, while 0.47a is more inclined to use kademlia. Of course, both Protocols can
. In addition, 0.47c adds a feature for tracking requests that have been sent, that is, a list containing the trackpackets_struct type, this details the time when an opcode request was sent to an IP address.
Why? This is to prevent routing contamination attacks against DHT, because when you search for a contact, if you find some contact information, you will try to add it to the local contact information first.
Information List. In this way, if someone wants to launch a malicious attack, it only needs to constantly send kademlia_res to the eMule client it wants to attack and contain a large number of false connections in the message content.
People information, so that the contact information list of the other party is full of garbage. In this way, due to the lack of correct and valid contact information, its kademlia network function is basically useless. This feature added in 0.47c will directly ignore the situation where no response is sent, so as to avoid being fooled.

Kafemlia distributed index management in eMule

The biggest benefit of the kademlia network is to distribute the information originally stored on the central Indexing Server to various clients. If you want to make it more accurate, then we can say that it distributes the information to the cindexed classes of various eMule clients. We can start to look at the design of cindexed and how it completes the work. Before that, let's take a closer look at the various types of information that eMule publishes to the kademlia network.

A file source information is the hash value of a file content and the IP address of the client that owns the file, the correspondence between various port numbers and other information. A keyword information is the keyword and it
The relationship between the corresponding files. In keyword information, the corresponding file information should be more detailed, usually including the file name, file size, and file content hash value of this file, if it is MP3 or other
Media files include the author, production time, file length (this length is the playback length of media files measured by time), genre, and other tag information. The hash value of the file content is used for partition
Different files corresponding to the keyword.

Cindexed uses a series of maps to store the corresponding information. cmap is the template class for implementing the map in the standard STL in MFC. cindexed contains four such classes, they are used to store the file source information, keyword information, file comment information, and load information. The file comment information is not saved for a long time,
Other information will be written to the file upon exit, and re-transferred when the eMule is restarted the next time. In addition, the load information is not published by other contacts, but based on the file source information and keyword information.
Dynamically adjust the release status. The load of the corresponding ID increases every time you receive the release information, which is reflected in the Response Message (kademlia_publish_res ).
Now.

Information in cindexed is frequently checked,Every thirty minutes, it will clear all the old information stored by itself. The storage time of the file source information is five hours, the keyword information is twenty-four hours, and the storage time of the file comment information is also twenty-four hours.Therefore, file publishing and keywords must be carried out cyclically and repeatedly. In fact, this is also good for the stability of the whole kademlia network, because every contact attempts to add the other party to its own contact list, you can also specify the last time you saw the contact in the contact list.

Cindexed provides the interfaces for adding information and searching information required by other code, so that the relevant search or publishing requests can be obtained from the network, after ckademliaudplistener completes message interpretation, it can be handed over to cindexed for processing.

Kademlia search task management in eMule

Csearch and csearchmanager complete specific search tasks. Csearch corresponds to a specific search task, which includes all the processes from the start and end of a search task,
Note that a search task is not just a task that searches for file sources or keywords. To publish a task, you also need to create a csearch object and run it.
Csearchmanager is familiar with all the search tasks. It contains a cmap containing all the csearch pointer objects. The reason for using cmap is that all
Csearch must correspond to a specific ID, which is the target of the csearch. Whether you want to find a node or search or publish information, you must find the target ID that is similar to the target ID.
Contact. Therefore, csearchmanager can use cmap to represent all search tasks.

We noticed that csearch added itself to csearchmanager when it was created. In addition, csearch needs to describe its type when it is created. For example, it is only for searching nodes, keywords, or file source information, of course, it may also be the source information or keyword information of the published file. Let's take a look at the functions of several csearch methods to get a rough idea of the csearch process. Go
Is its startup process. It will start searching for candidate contacts from the local contact list for the first time and start searching. The sendfindvalue function is to send a search request to a contact.
A request such as the contact information of a specific ID. Jumpstart is when the search reaches a certain level. If some intermediate results are obtained, the next step is started.
It may be sendfindvalue, or you may think that the searched contact is close enough to the target, so you can start a substantive request. Storepacket is such a real
For example, in a csearch task where the published file source is used as the task, storepacket sends a request to the target contact.
Kademlia2_publish_source_req (if kademlia2 is not supported, it is kademlia_publish_req ). Most
After that, csearch can process various search results and then return the processed results to the code that calls it.

Csearchmanager directly contacts other parts of the code of the kademlia network. For example, if ckademliaudplistener finds some results, it will
Submit the results to csearchmanager, and then csearchmanager searches for the search task and transfers the results. In addition
Csearchmanager provides an interface for creating various new search tasks. The function is similar to the factory in the design mode. The Code of other parts only needs to describe what to start.
Csearchmanager is used to create a csearch task.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More