Kademlia of P2P (1)

Source: Internet
Author: User
Http://en.wikipedia.org/wiki/Kademlia ()
References: http://blog.csdn.net/tsingmei/archive/2008/09/13/2924368.aspx

Kademlia

 

Kademlia is a protocol algorithm implemented through distributed Hash Tables. It is designed by Petar and David for non-centralized P2P computer networks. Kademlia defines the network structure and the information exchange through node queries. The network nodes of kademlia use UDP for communication. All nodes involved in communication form a virtual network (or overwrite network ). These nodes are identified by a group of numbers (or node IDs. Node IDS can be used not only for identity, but also for value locating (values are usually file hashes or keywords ). In fact, the node ID corresponds directly to the file hash, indicating where the node stores information about the file and resources.

 

When we search for certain values in the network (that is, nodes that usually search for stored files or keywords), The kademlia algorithm needs to know the keys related to these values, then start searching in the network step by step. Some nodes are found in each step. The IDS of these nodes are closer to the keys, if a node returns a search value directly or cannot find a node ID that is closer to the key, the search will stop. This method of searching values is very efficient: similar to the implementation of other Distributed Hash Tables, in the search of system values that contain N nodes, kademlia only accesses O (log (N) nodes.

 

 

The non-centralized network structure has a greater advantage, that is, it can significantly enhance its ability to defend against DoS attacks. Even if a batch of nodes in the network are under flood attacks, the network availability will not be greatly affected. by bypassing these vulnerabilities (attacked nodes), we can re-build a network, the availability of the network can be restored.

 

 

 

Content

1. System details

1.1 route table
1.2 system details
1.3 locate a node
1.4 locate resources
1.5 join the kademlia Network
1.6 query Acceleration

2 academic significance

3. Application in the file sharing network

1. System details

 

The first generation of P2P file sharing networks, such as Napster, rely on the central database to Coordinate queries in the network. The second generation of P2P networks, such as Gnutella, use flood to query files, it searches all nodes in the network. The third-generation P2P network uses a Distributed Hash table to query files in the network. The Distributed Hash Table stores Resources in the network, the main goal of these protocols is to quickly locate desired nodes.

 

Kademlia is calculated based on the distance between two nodes. The distance is the same or different between two network nodes. The calculation result is finally returned as an integer value. The keyword and the node Id have the same format and length. Therefore, you can use the same method to calculate the distance between the keyword and the node ID. Generally, a node ID is a large random number. When this number is selected, the uniqueness of the node ID is desired (the node ID is unique throughout the network ). The difference or distance is not related to the actual geographical location, but only to the ID. Therefore, nodes from Germany and Australia may become neighbors due to similar random IDs.

 

Select an exclusive distance, or because the distance calculated through it enjoys some features of the geometric distance formula, especially in the following points:

 

The difference or distance between a node and itself is 0.

 

The difference or distance is symmetric: that is, the difference or distance from A to B is equivalent to the difference or distance from B to.

 

The difference or distance conforms to the triangle inequality: given three vertices a B C, if the difference or distance between the AC is the largest, the difference or distance between the AC is smaller than or equal to the sum of the AB difference or distance and BC difference or distance.

 

Because of these attributes, the calculation workload is greatly reduced in the measurement process of actual node distance. Each iteration of the kademlia search will be at least one bit closer to the target. A basic kademlia network with 2 n nodes can find the searched node or value in the worst case by taking n steps.

 

 

1.1 route table

To simplify the description, this section constructs a route table based on a single bit. For more information about the actual route table, see "query acceleration.

 

The kademlia route table consists of multiple lists, each of which corresponds to one node ID (for example, if the node ID has a total of 128 bits, the node's route table will contain 128 lists ), contains multiple entries, which contain the data necessary to locate other nodes. The data in the list entries is usually composed of the IP addresses, ports, and node IDs of other nodes. Each list corresponds to some nodes with a specific distance from the node. The Nth bit of the node found in the nth list of the node must be different from the nth bit of the node, the first n-1 bits are the same, which means that it is easy to fill the first list with half of the nodes on the network that are far away from the node (the first node has a half of the nodes at most ), the second list is filled with 1/4 nodes in the network (closer to those in the first list), and so on.

 

If the ID has 128 binary bits, each node in the network divides all the other nodes into 128 classes based on their differences or distances. each bit of ID corresponds to one of the classes.

 

As nodes in the network are discovered by a node, they are gradually added to the corresponding list of the node, this process involves saving information to the node list and retrieving information from the node list, and even assisting other nodes in searching for corresponding key values. All nodes found in this process will be added to the node list, so the node's perception of the entire network is dynamic, which keeps the network updated frequently, enhanced the ability to defend against errors and attacks.

 

In a text file related to kademlia, the list is also called a K bucket. K is a system variable, for example, 20. Each K bucket is a list of up to k entries, that is, A list of all nodes in the network (corresponding to a bit, which is a specific distance from the node) can contain up to 20 nodes.

 

As the corresponding bit decreases (that is, the corresponding difference or distance is getting shorter), the number of possible nodes contained in the K-bucket decreases rapidly (this is because the difference or distance corresponding to the K-bucket is closer, the smaller the number of nodes). Therefore, the K-bucket corresponding to the lower bit obviously contains all the nodes in the network. Because the actual number of nodes in the network is much smaller than the number of possible IDCs, some K buckets corresponding to those short distances may remain empty (if the difference or distance is only 1, the maximum number of possible nodes is 1. If no node is found for this exclusive or 1-distance node, It is null for K buckets with an exclusive or 1-distance node ).

 

 

 

 

 

Let's look at the simple network above. The network can have a maximum of 2 ^ 3, that is, 8 keywords and nodes. Currently, a total of 7 nodes are added, each node is represented by a small circle (at the bottom of the tree ). We consider the node 6 marked with a black circle. It has three K buckets. nodes 000,001 and 2 (expressed in binary as and 010) are candidates for the first K bucket, node 3 is currently not added to the network (in binary format: 011). node 4 and node 5 (in binary format: 100 and 101, respectively) are candidates for the Second K-bucket, only node 7 (111 in binary format) is a candidate node with 3rd K buckets. In the figure, the three K buckets are represented in gray circles. If the size of the K bucket (that is, the K value) is 2, the first K bucket can only contain two of the three nodes.

As we all know, those nodes with long-term online connections will be more likely to be online for a long time in the future. Based on this rule of static statistical distribution, kademlia chooses to store those nodes that are online for a long time into k buckets, this method increases the number of valid nodes at a certain time point in the future, and also provides a more stable network.

When a K-bucket is full and a new node corresponding to the bucket is found, first check the earliest accessed node in the K-bucket. If the node is still alive, then the new node is arranged in an affiliated List (as an alternative cache ). the cache replacement is used only when a node in the K bucket stops responding. In other words, the newly discovered node is used only when the old node disappears.

 

 

1.2 protocol messages

 

The kademlia protocol has four types of messages.

 

 

Ping message-used to test whether the node is still online.
 
Store message-store a key-Value Pair in a node.
 
Find_node message-the receiver of the message request returns the K nodes closest to the request key value in its bucket.

 

 

The find_value message is the same as the find_node message. However, when the request receiver contains the key requested by the requester, it returns the value of the corresponding key.

 

 

Each RPC message contains a random value added by the initiator, which ensures that the response message can match the request message sent previously.
 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.