emule Source Code Resolution (v)

Last Update:2018-07-26 Source: Internet

Author: User

Tags data structures network function

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

General description of Kademlia code in emule

When you start using the Kademlia network in emule, you will no longer have a central server failure, because in this network, there is no central server, or, all users are servers, all users are clients, and thus completely to achieve a peer-to-peer. Next, for analysis of the Kademlia network in emule, there will be a section for the analysis of the principle. The other sections are described separately according to the different classes that are used in the implementation of Kademlia in emule. which

Ckademlia is the main control class for the entire Kademlia network, which can start or stop the Kademlia network directly, and contains process methods to handle daily affairs.

Cprefs is responsible for handling its own Kademlia-related information, such as its own ID.

The Croutingzone,croutingbin and ccontact three classes form the contact information that each node understands and the data structures that are composed of these contact information .

Ckademliaudplistener is responsible for processing network information.

Cindexed is responsible for processing index information stored locally.

Csearch,csearchmanager is responsible for processing and search-related operations, in which the former represents a single search task, which is responsible for processing all search tasks.

CUINT128 is responsible for processing a 128-bit long integer and has its various operations built into it. It has been mentioned before.

The basic principle of kademlia in emule

Kademlia is a structured overlay network (structured Overlay network), the so-called overlay network, is a virtual network built on the physical Internet, where all participating nodes know the IP address of some other node Called its neighbor, if you need to find something, it first search in the local, if not found, the query forward to its neighbors, hoping to be able to find the appropriate results. Covering the network is divided into two types of structured and unstructured, and the difference is that each node knows which other nodes have specific rules of information. In unstructured overlay networks, the neighbor status of each node has no specific regularity. So in a unstructured network, if a query is to be made, a method called flood (flooding) is taken, and each node, if it does not find the desired result locally, forwards the lookup request to its neighbor and then makes a step-by-step lookup through the neighbor's neighbor. However, if this method is not handled well, it will cause the message load of the whole network to be too large. There have been a lot of articles on optimizing the query of unstructured overlay network.

For a structured overlay network, it is characterized by each node it will choose and which nodes to do the neighbor has a certain regularity, so in the search, when the node to the search request for forwarding when it can through a certain law to choose to forward the request to which neighbor nodes. This can also reduce search costs. A structured overlay network usually requires each node to randomly generate an ID to determine the relationship between each node. This ID must not be related to the physical network it is in.

For Kademlia networks, this ID is a 128-bit value, and all nodes use this ID to measure their logical distances from other nodes. And the logical distance is calculated by the two nodes in different or (XOR) operations. In the formation of Kademlia networks, the principle that each node chooses its neighbor is that the node closer to its logical distance is more likely to be added to its list of neighbors, specifically, each time a new node is obtained. Whether to add it to your neighbor list is handled according to the distance. There is a description of the code that analyzes the specific program later.

The advantage of a structured network is that if we are looking for a node that is close enough to a logical distance from an ID, we can guarantee that the hop count at the O (Logn) level is found. Just look for a node that is known to be a logical distance from the target ID, and then ask it to know more about it, and then go ahead. So when it comes to searching, when you need to publish resources, hash the file so that you can calculate a 128-bit ID, or hash the keyword. Then find the node that is closest to the logical distance to this result, and send the file or keyword message to it and let it save. When someone wants to search for the same thing, because it uses the same hash algorithm, it is able to compute the corresponding ID and to search for those nodes that have a similar logical distance to this ID, because it knows that these nodes are the most likely to know this information if they really have these resources in the network. From this we can see that the structure of the network resource lookup efficiency is very high, but it and unstructured coverage network compared to the disadvantage is not to carry out complex queries, that is, only simple keywords or file hash value to find. The search for unstructured networks is itself forwarded randomly, the nodes of each received query request are well aware of the local resources, so it is natural to support complex queries, but it is clear that complex queries with unstructured network support are unlikely to mobilize all nodes to do so. There is no way to combine the advantages of two coverage networks, and I would like to know a way to do that.

Kademlia infrastructure classes in emule

Kademlia's main control class is Ckademlia, which is responsible for starting and shutting down the entire Kademlia network's associated code. In its process function, transactions related to the Kademlia network are processed, such as checking for the number of nodes in a certain interval over time, and if so, finding new nodes. Other frequent checks on their neighbors are all part of the work that needs to be done in a day-to-day setting. The day-to-day processing of all search tasks also requires it to be scheduled. It also acts as a representative of the Kademlia network and returns some statistics on the Kademlia network to the code in other parts of the emule.

Another infrastructure class is cprefs, which is similar to the cpreferences function in emule normal code, but cprefs only preserves local information that is relevant to the Kademlia network and requires long-term preservation. Specifically to this version, the main is the local ID.

Another important infrastructure is CUInt128, which implements various processing of 128-bit IDs, as mentioned in the previous section.

Contact list management for Kademlia in emule

The Croutingzone,croutingbin and ccontact three classes make up the contact list data structure. It is to meet our search requirements, that is, the time to search for the target to be acceptable, and the space used to be able to accept.

First, the Ccontact class contains thea contact personInformation, mainly including each other's IP address, id,tcp port, UDP port, kad version number and its health level (M_BYTYPE). The health level has 0-4 five levels. The contact you just joined, that is, the health condition is not known, this number is set to 3. The system will often be in contact with each contact to check its health status, and often be able to contact the contact person, this number will slowly reduce to 0. And there is no connection, this number will slowly increase, if the increase to 4 after a period of time after the failure to successfully contact, will be removed from the contact list.

The Croutingbin class contains a list of ccontact (typedef std::list<ccontact*> contactlist;). Note here that the information to access the contact must be within a croutingbin,croutingzone that does not directly contain the contact information. You can add new contact information to a specific croutingbin, and, of course, contact people to find them. It also provides a way to find a contact that is closest to an ID and gives a list of such. This is quite important. Finally, there is a limit to the number of ccontact that can be included in a Croutingbin class. (In the Kademila namespace, #define K 10 is defined)

The Croutingzone class is at the top of the contact data structure, providing an operational interface directly for the Kademlia network. The class is structured as a binary tree, containing two croutingzone pointing to its Saozi right subtree and a pointer to a croutingbin type. However, this pointer to the Croutingbin type is meaningful only if the current Croutingzone class is the leaf node of the entire binary tree. (Croutingzone *m_psubzones[2]; Croutingzone *m_psuperzone) This binary tree is characterized by the fact that the IDs of all the contacts below each node contain a common prefix, and the deeper the number of layers in the node, the longer the common prefix. For example, the ID of all nodes in the left subtree of the root node must have a prefix of "0", and all nodes of the right subtree must have a prefix of "1". Similarly, the IDs of all nodes under the right subtree of the root node's left subtree must have a prefix of "01", and so on, and so forth. Let's imagine the process that nodes are constantly adding to this binary tree. At first there is only one root node, it is the leaf node, then it is the internal croutingbin is meaningful, when the contact information is constantly added, the croutingbin capacity is full, at this time is a split operation. At this time, will add two left child nodes and right child nodes, and then the contact information in their own croutingbin to the left node and the right node according to their prefix characteristics, and finally abolish their own croutingbin, so this split process is over. When the split is complete, you try to add the contact information again, and then try to add it to the corresponding subtree by its ID. But not all of these nodes will split up, because if arbitrary splitting is allowed, the amount of node information locally required to store will increase dramatically. Here, the role of their own ID is reflected. This node splits only when its ID and the node currently ready to split have a common prefix, and if a node cannot be divided and its croutingbin is full, the contact information is rejected.

We can see that in the above policy, the closer the logical distance from its ID (that is, the longer the common prefix), the more likely the contact information will be added, because its corresponding node is more likely to get more child nodes because of splitting, and correspondingly more capacity. In this way, in the Kademlia network, the higher the proportion of participants ' information each participant knows, the closer they are to their logical distance. Because when you search, you just have to keep looking for a closer ID, and there must be progress in every step, so the time it takes to find the target ID is O (logn), and from the structure of the binary tree, we can see that because only a subset of the nodes will split, So the space cost of essentially storing is also O (Logn).

In fact, there are some differences between the implementation and the theoretical kademlia, such as starting from the root node, there is a minimum number of split layers, that is, if the layer is too low, it is always allowed to split, so that it knows the other areas of the contact information can be a little more croutingzone.

Kademlia Network message processing in emule

Ckademliaudplistener is responsible for handling all messages related to Kademlia network. We have already made a general description of the basic situation of emule communication protocol, and we can know that the Ckademliaudplistener processing message must be related to the Kademlia network, the sorting work has beenemule normal UDP client processing codeIt's done there. The specific message format is preceded by some introductions, and the following is a description of some specific message classifications.

The first is the health inspection of the news, such news is the general ping-pong mechanism. The corresponding message has Kademlia_hello_req and kademlia_hello_res. When a list of local contact information is checked, kademlia_hello_req messages are sent to them, and then the Kademlia_hello_res messages that are received are processed.

The most commonly used message is node search message, in Kademlia Network, node search is the main message for daily application, and its implementation is iterative search. This means that when you start searching for an ID, find the nearest contact in the local contact information list, and then make a search request to them, so that you can usually get some contact information closer to them, and then send them a search request, and by continuing to search for such queries, You can get the contact information that is closest to the target ID. The corresponding message code here is Kademlia_req and kademlia_res. (These two message codes are followed to update the routing table)

The next step is to publish or search for the content. This combination of the following Cindexed class analysis can be known more clearly. There are three main types of information stored in the Kademlia network in emule: file source, keyword information and file comments. The file source corresponds to each specific file, and each file isThe hash value of its contentsAs the only indication of the file, a file source information is a fact about someone owning a particular file. A keyword message is the fact that the keyword corresponds to a file. Obviously, one keyword may correspond to multiple files, and a particular file source may be more than one. But their indexes are based on a fixed hash algorithm, making it easy to search and publish.

Let's look at the release process. Each emule client has already figured out the details of its own shared files, and in a traditional, central index Server scenario, it uploaded all of its files to the central indexing server. But in the Kademlia network, it needs to spread out, it is the first thing to do is the name of the word, that is, from the file name to decompose a keyword out of one, it is the method of cutting words is very simple, that is, in the file name to find those who have the meaning of the characters, such as underlined, and then cut the file name. After calculating the hash value of these keywords, it publishes the keyword information to the corresponding contact. And the file information is also published to the contact with the file content hash value close to the person. The corresponding messages are Kademlia_publish_req and kademlia_publish_res (these two message codes are used to publish shared files). In addition, emule allows users to comment on a file, and the comments are kept separately, but the principle is the same.

When users use the Kademlia network to search and download files, the first is to search a keyword, because the same hash algorithm is used, so that it can only find the ID value and calculated hash value results close to the contact information, It can directly send them a request to search for a specific keyword. If you get the return information, the searcher knows how many files the keyword corresponds to, and then lists the information for the file. When a user decides to download a file, the search process for that particular file begins, and this time if the search succeeds, the file source information for that file is returned. This way emule then only need to follow the information to connect to the appropriate address, and use the traditional emule protocol to negotiate with them to download the file. The corresponding message here is Kademlia_search_req and kademlia_search_res (these two message codes are used to search for files).

The actual implementation has KADEMLIA2 this protocol, it is the same principle, only the protocol code and the specific message format is not the same, such as Kademlia_req and kademlia_res corresponding Kademlia2_hello_req and kademlia2_ Hello_res, but the latter contains information that is richer than the former in specific messages. At the time of implementation 0.47c is more inclined to use KADEMLIA2, while 0.47a is more inclined to use Kademlia. Both of them, of course, can be dealt with. In addition, the 0.47c adds an attribute to tracing the requested request, which is a list of trackpackets_struct types that details when the IP opcode corresponding request was made. Why would you do that? This is to prevent a routing pollution attack against DHT, because when you search for a contact, if you search for some contact information, you will try to add it to your local contact information list first. This way, if someone wants to attack maliciously, it can send kademlia_res to the emule client that it wants to attack, and contain a lot of false contact information in the content of the message, so that the other person's contact information list is full of rubbish. Thus, due to the lack of correct and effective contact information, its Kademlia network function is basically obsolete. The addition of this feature in 0.47c will simply ignore the situation where the response has not yet been made, thus avoiding being fooled.

Distributed index management of Kademlia in emule

The great benefit of Kademlia networks is that the information that was originally needed to be stored in the central Index Server is distributed to each client, and if we are to be more precise, we can say that it is distributed to the cindexed classes of each emule client. We can start by looking at the design of cindexed and see how it completes the work. Before we do that, we need to say a little bit more about the various types of information that emule publishes to the Kademlia network.

A file source information is the hash value of a file's contents and the correspondence between the IP address of the client that owns the file, the various port numbers, and other information. A keyword message is the relationship between the keyword and its corresponding file. In the keyword information, it corresponds to the file information to be more detailed, this usually includes the file name, file size, hash value of the file, if it is MP3 or other media files, and includes author, production time, file length (the length is the playback length of the media file measured in time), Genre and so on tag information. The hash value of the file contents is used to distinguish the different files of the keyword.

Cindexed uses a series of maps to store these corresponding information, CMap is MFC in the implementation of standard STL Map template class, Cindexed contains four such classes, respectively, to store file source information, keyword information, file comment information and load information. The file comment information is not long saved, and other information will be written to the file at the time of exit, the next time you restart the emule. In addition, the load information is not published by other contacts, but is dynamically adjusted according to the information of file source and key word. The fact that the load of the corresponding ID is increased each time the publication is received is reflected in the response message (kademlia_publish_res).

The information in the cindexed is often checked,every 30 minutes it clears out the old information from all the information it stores. The file source information is saved for five hours, the keyword information is 24 hours, and the information of file comments is kept for 24 hours. Therefore, the release of the file and keywords also have to be repeated periodically. This is also good for the stability of the entire Kademlia network, because each contact tries to add the other person to your contact list or to the contact list to mark the time you saw each other.

Cindexed provides the interface for additional information and search information needed by other parts of the code so that it can be handed over to cindexed for processing after obtaining the relevant search or publishing request from the network, and ckademliaudplistener the explanation of the message.

Kademlia Search task management in emule

Csearch and Csearchmanager are the exact search tasks to complete. Csearch corresponds to a specific search task, which includes a search task from the start to the end of the entire process, it should be noted that the search task is not just the search for file source or keyword task, a publishing task it also needs to create a Csearch object, and let it start execution. Csearchmanager the search task, which contains a cmap containing all csearch pointer objects, because all csearch must correspond to an ID, which is the target of the csearch. Whether you're looking for a node, or you're searching or posting information, you must always find a contact with a similar target ID. Therefore, Csearchmanager can use CMAP to represent all search tasks.

We notice that Csearch is adding himself to the Csearchmanager when it is created. In addition, Csearch needs to describe its type when it is created, for example, just to search the node or to search for keyword information or file source information, and of course it may be the release of file source information or keyword information. We introduce the csearch of several methods of the role of the Csearch can probably understand the work process. Go is its startup process, which starts by looking for a candidate contact from a local contact list for the first time, and then starts the search. The function of Sendfindvalue is to send a contact to a contact person to search for an ID. Jumpstart is when the search has reached a certain point, such as getting some intermediate results, to start the next step, the next action may still be sendfindvalue, may also think that the search of the contact is close enough to the target, so you can start a substantive request. Storepacket is such a substantive request, for example, in a csearch with a publishing file source as a task, Storepacket sends Kademlia2_publish_source_req to the target contact ( If KADEMLIA2 is not supported, then it is kademlia_publish_req. Finally, Csearch can handle a variety of search results and then return the processed results to the code that calls it.

Csearchmanager directly with other parts of the Kademlia network code, for example, if Ckademliaudplistener search for some results, it will give the results to Csearchmanager, Then csearchmanager to find out if the result belongs to the search task, and it is forwarded. In addition, Csearchmanager provides an interface to create new search tasks that are similar to the factory in design patterns, and other parts of the code just need to explain what kind of search task to start. Csearchmanager to complete the corresponding task of creating Csearch.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More