Chord algorithm (principle)

Source: Internet
Author: User

The chrod algorithm is one of the four algorithms in P2P. It was proposed by MIT in 2001. The other three algorithms are:

  • Can
  • Pastry
  • Tapestry

Chord aims to provide an algorithm that can quickly locate resources in P2P networks. It does not care about how resources are stored. It only studies the acquisition of resources at the algorithm level, therefore, the chord API simply has only one set and get.

1. What is Chord?

Chord is an algorithm and a protocol. As an algorithm, chord can strictly prove its correctness and convergence from a mathematical perspective. As a protocol, Chord defines the message types of each link. Of course, another major reason why chord is sought after is that chord is simple enough and 3000 lines of code are enough to implement a complete chord.

Chord can also be implemented as a consistent hash and distributed Hash (DHT.

2. overlaynetwork)

A Coverage Network is a network that is built on other networks and is connected by virtual or logical connections between network nodes. For example, cloud computing and distributed systems cover the network, because both are built on top of TCP/IP, and there is a connection between nodes. Chord is also built on the Coverage Network.

3. structured and unstructured Networks

Non-structured P2P networks refer to the absence of organizational relationships between network nodes and the full equivalence between nodes. For example, the first generation of P2P network Napster has a clear and simple network structure, however, there is not much room for optimization in searching. It is often used for global or partition pan-flood searching. The searching takes a long time and the results are hard to guarantee (it may time out before it is found ).

 

Structured P2P networks are the opposite of non-structured networks. We think there is a logically designed structure in the network. For example, Chord assumes that the network is a ring and kadelima assumes that it is a binary tree, all nodes are leaf nodes of the tree. With these logical structures, we have introduced many other algorithms and ideas for resource search.

4. Distributed Hash table (DHT)

The main idea of DHT is to make the access to resources on the network simple and fast, like hashtable, put and get. The idea was born mainly by the first generation of P2P (Napster) network impact. Compared with consistent hash, DHT emphasizes resource access, regardless of whether the resource is consistent. Like consistent hashing, DHT is just a concept, and details are left for each implementation.

Currently, these P2P implementations can be used as the detailed implementation of DHT, and some representative implementations are listed again:

  • Chord
  • Can
  • Tapestry
  • Pastry
  • Apache cassandra
  • Kadelima
  • P-grid
  • BitTorrent DHT
5. Chord implementation principle

Chord maps node and key to the same space to ensure consistent hash. To ensure non-repetition of hash, Chord selects SHA-1 as the hash function, and SHA-1 generates a 2160 space, each item is a 16-byte (160bit) Big integer. We can think that these integers are connected at the beginning and end to form a ring called a chord ring. Integers are arranged clockwise on the chord ring. node (machine IP address and port) and key (resource ID) are all hashed to the chord ring, in this way, we assume that the status of the entire P2P network is a virtual ring, so we say chord is a structured P2P network.

 

The following are definitions:

  • We call every node in the chord ring as a identifier.
  • If a node is mapped to a flag, it is still called Node
  • Clockwise, the node is followed by the predecessor. Similarly, the first predecessor is called the direct successor, and the first successor is called the direct successor.

Red points are node, and blue is the identifier. The above is only part of the nodes and flags. Taking node N1 as an example to describe successor in its finger table:

 

No Ith successor Successor
1 N1 + 20 N18
2 N1 + 21 N18
3 N1 + 22 N18
4 N1 + 23 N18
5 N1 + 24 N18
6 N1 + 25 N45
7 N1 + 26 N1
8 N1 + 27 N1

 

By ing node and key to a value field, we feel that we can put the dog and cat together for measurement. Although it is a bit strange, this ensures consistent hashing and can be used in detail before the exam.

 

Obviously, the number of nodes distributed on the chord ring is much smaller than the number of signs (2160 is an unmeasurable astronomical number ), in this way, the nodes in the chord ring will be very sparse and distributed in the chord ring. In theory, it should be a random distribution. However, as discussed in the previous consistent hash, assuming that the number of nodes is small, distribution must be uneven. You can consider adding virtual nodes to add their balance. If there are many nodes (for example, there are millions of machines in a large P2P network), you do not need to introduce virtual nodes.

 

Obviously, no matter what the search is, it will be able to find the result only along the chord circle. This time complexity is O (n), n is the number of network nodes, but for a single million nodes, in addition, O (n) is intolerable for P2P networks with frequent addition and exit of nodes. Therefore, Chord proposes the following non-linear search algorithms:

  1. Each node maintains an finger table. The table length is m (M is the number of digits, and 160 is in chord). The I of this table stores the Nth (n + 2i-1) of node n) moD 2 MB successor (1 <= I <= m)
  2. Each node maintains a list of predecessor and successor. This list can be used to quickly locate successors and successors, and periodically check the health status of successors and successors.
  3. That is to say, the stored successor is incremented by an equal ratio of 2, so the modulo is because the successor of the last node is the Start Node, for example, the next node of the largest node is defined as the first node.
  4. The Resource Key is stored on the following node: The first node along the chord ring, hash (node)> = hash (key). We call this node the successor of this key.
  5. For a given key, follow these steps to find the node on which the corresponding resource is located, that is, the successor of the key: (if the search is performed on node N)
  • Check whether the hash of the key falls between node N and its direct successor. If the query ends, the successor of N is
  • In the finger table of N, locate the successor of n that is less than hash (key) and is less than hash (key). This node is also the predecessor closest to the key in the finger table, forward search requests to this node
  • Continue the above process until the corresponding node of the key is found.

Intuitively, the last search process should be Exponential Convergence, similar to the binary search, and the convergence speed should be very fast. In turn, the search time or route complexity should be logarithm, here we will prove this.

 

It indicates that the process of node N1 searching for node n53 is still very fast:

 

6. Proof of chord convergence

For an algorithm, convergence is crucial. If there is no convergence guarantee, it is futile to put more thoughts on the program. Before proof, we should emphasize three more points:

  • The key is stored on the successor node of the key (satisfied: Hash (node)> = hash (key ))
  • The I entry of node n stores the successor (n + 2i-1)
  • The search is based on the recent principle. If the current node does not store the key, you can find the node that is closest to the hash (key) in the finger table to continue this process.

The difference between the successor of the key and the successor of node N is as follows.

 

If the distance between the I successor and the key in the finger table of node N is short, the key is in the middle of the I and I + 1 items.

Note that item I is J and item I + 1 is P

  • J <Hash (key)
  • P> Hash (key)

And:

J = N + 2i-1

P = N + 2I

The distance between node N and key should be between N and J and P, that is, J-n <n-Hash (key) <p-n

 

(1) 2i-1 <n-Hash (key) <2I

(2) The maximum distance between J and key is J-Hash (key) <p-j = 2i-1.

That is to say, the distance between J and key is less than the distance between N and key, and the distance between J and key is less than half the distance between N and key. This ensures that the distance between J and key will converge in each iteration, in addition, it converges at least by 2 exponent, that is, half-lookup.

 

So far, our theory proves the convergence of chord.

 

7. go deep into the Chord algorithm

In fact, the Chord algorithm can be completely transformed into a mathematical problem:

Nodes are randomly marked as node sets on the chord ring, and node T is randomly specified. node T can be found from the random start of node n Based on the chord search algorithm.

 

Why is this conversion possible? Because we only need to find the direct forward step of the key, even if we find the key, all the problems are converted into a problem of finding the node through the node in the chord ring. In this way, this question becomes amazing immediately. If we record the search steps as paths and convert them into a shortest path between two random nodes, the Chord algorithm is actually constructing such a shortest path. Will this path not exist? No, because chord itself is a ring, the worst case is to ensure its convergence through linear search.

 

From the perspective of Shortest Path, Chord only improves existing linear paths. Based on this idea, we can design other Shortest Path Algorithms. From the perspective of the algorithm, the premise of ensuring algorithm convergence or correctness is that each node must correctly maintain its successor node. However, in a large P2P network, nodes are frequently increased and exited, if there is no additional work, it is very difficult to ensure that each node has a correct successor.

 

Chord redundancy:

Redundancy refers to the useless items in the finger table of chord. The items between node N and its successor are meaningless because the successor represented by these items does not exist. For example, in the finger table of N1, 1st ~ None of the five items exist, so they all point to n18, at least 1st ~ The four items are redundant information.

Generally, if the chord ring is 2 m and the number of nodes is 2n, if the nodes are evenly distributed on the chord ring, the condition that item I in the finger table of any node N is redundant is: N + 2i-1 <n + 2 m/2n => 2i-1 <2m-n => I <m-n + 1, that is, I <m-n + 1 is redundant.

Redundancy: (m-n + 1)/m = 1-(n-1)/m, generally M> N. Therefore, chord has a lot of redundant information. If there are 1024 nodes on the network, that is, n = 10, the redundancy is: 1-(10-1)/160 ≈ 94%. Therefore, many papers have pointed out this point and think it will cause redundant queries and reduce performance. In fact, because the redundant information is distributed in the finger table of multiple nodes, assuming that the routing algorithm is used by the routing algorithm, the routing computing will not be affected.

 

So far, we have fully discussed the Chord algorithm and its core ideas. Next we will discuss the detailed implementation of chord.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.