Chord algorithm (principle)

Source: Internet
Author: User

The Chrod algorithm is one of the four most-peer algorithms, and MIT (MIT) proposed in 2001, the other three algorithms are:

    • CAN
    • Pastry
    • Tapestry

The purpose of chord is to provide an algorithm that can locate resources at high speed in the peer-to-peer network, and cord does not care how the resources are stored, only the acquisition of resources from the algorithmic level, so the chord API is simple enough to have only a set, get.

1. What is chord?

Chord is an algorithm and also a protocol. As an algorithm, chord can strictly prove its correctness and convergence from the point of view of mathematics. As a protocol, chord specifically defines the message type for each link. Of course, one of the main reasons why chord is popular is that chord is simple enough and 3000 lines of code are sufficient to achieve a complete chord.

Chord can also be implemented as a consistent hash, distributed hash (DHT).

2. Coverage network (overlaynetwork)

Overlay network refers to a network that is built on top of other networks, network nodes by virtual or logical connection, for example, cloud computing, distributed systems are covered network, because it is built on TCP/IP, and the nodes are connected. Chord is also built on the overlay network.

3. Structured and unstructured networks

Unstructured network refers to the network nodes do not exist between the relationship between the nodes are completely equivalent, for example, the first generation of peer-to network Napster, such network structure is clear, simple, but the search is not much optimization, often using global or zonal flooding lookup, find a long time, And the result is difficult to guarantee (it may be timed out before it is found).

Structured-to-peer network and unstructured exactly the opposite, we feel that the network in the logical existence of a human design structure, for example chord assume that the network is a ring, Kadelima is assumed to be a binary tree, all nodes are tree leaf nodes. With these logical structures, we have introduced a lot of other algorithms and ideas to our resource lookup.

4. Distributed hash Table (DHT)

The main idea of DHT is to make the access of resources on the network like Hashtable, which can be put and get simply and at high speed, and the idea is mainly influenced by the first generation of peers (Napster) network. Compared to a consistent hash, DHT emphasizes access to resources regardless of whether the resource is consistent. As with the consistent hash, DHT is just a concept, with detailed details left to the implementation.

These current-to-peer implementations can be implemented as a detailed implementation of DHT, again enumerating some representative implementations:

    • Chord
    • CAN
    • Tapestry
    • Pastry
    • Apache Cassandra
    • Kadelima
    • P-grid
    • BitTorrent DHT
5, Chord realization principle

Chord by mapping node and key to the same space and guaranteeing a consistent hash, in order to guarantee the non-repetition of the hash, chord chooses SHA-1 as the hash function, SHA-1 produces a 2160 space, each of which is a large integer of 16 bytes (160bit). We can feel the whole number of the first end connected to form a ring, called chord ring. Integers are arranged clockwise on the chord ring, and Node (the IP address and port of the machine) and key (the resource identifier) are hashed to the chord ring, so we assume that the entire peer-to network is in a virtual ring, so we say that chord is a structured peer-network.

Here are a few definitions:

    • We call each node on the chord ring a marker.
    • Assuming that a node is mapped to a marker, continue to call the standard character node
    • In the clockwise direction, the front of the node becomes the predecessor (predecessor), after which the node becomes successor (successor); In the same vein, the first predecessor is called Direct Forward, and the first successor is called Direct successor.

The red dot is node, and blue is the marker. The above is just a partial node and a marker, with the node N1 as an example to illustrate its successor in the Finger table:

tr>
no ith successor successor
1 n1+20  N18 
 n18
3 n1+22  n18
4 n1+23  n18
5 n1+24  n18
6 n1+25  n45
7 n1+26   n1
8 n1+27  n1

Map node and key to a range the feeling is to put the dog and the cat together to measure, although a bit strange, but this can guarantee a consistent hash, detailed can refer to the previous article.

It is very clear that the number of node distributed on the chord ring is much smaller than the number of glyphs (2160 is an immeasurable astronomical figure), so that node on the chord ring is distributed very thinly on the chord ring, which should theoretically be randomly distributed, but as discussed in the previous consistency hash, Assuming that the number of nodes is not large, the distribution is certainly uneven, you can consider adding virtual nodes to add its balance, assuming that there are many nodes (for example, large-scale network with millions of machines) do not have to introduce virtual nodes.

It is clear that no matter what the search is only going along the chord ring the results will certainly be found, this time complexity is O (n), N is the number of network nodes, but for a millions of nodes, and the node is often increased, the exit of the peer-network, O (n) is intolerable, Therefore, chord proposes the following algorithms for nonlinear lookups:

    1. Each node maintains a finger table with a length of M (M is the number of bits, which is 160 in chord), and item I of the table holds node N's (n+2i-1) mod 2m successor (1<=I<=M)
    2. Each node maintains a list of predecessor and successor, the role of which is to enable high-speed positioning of the pre-and successor, and to periodically detect pre-and subsequent health conditions
    3. That is, the successor is stored in multiples of 2 and so on, since the final node successor is the beginning of several nodes, such as the largest node of the next node defined as the first node
    4. The resource key is stored on the following node: the first node along the chord ring, hash (node) >=hash (key), we call this node the successor of this key
    5. Given a key, follow the steps below to find out which node its corresponding resource is located in, which is the successor of the key: (If the lookup is on Node N)
    • See if the hash of the key falls between the node N and its direct successor, and if it ends, the successor of N is the search
    • In the finger table for N, find the successor of the nearest and
    • Continue the process until you find the appropriate node for key

Intuitively, the last lookup process should be exponential convergence, similar binary lookup, convergence speed should be very fast, in turn, the lookup time or routing complexity should be logarithmic, and we will prove this in the following.

Indicates the process of node N1 finding node N53, or fast:

6. Proof of chord convergence

For an algorithm, convergence is very important, assuming that there is no convergence to ensure that the process of a lot of thought is futile. Before we prove it, we emphasize 3 points:

    • Key is stored on the successor node of key (satisfies: hash (node) >=hash (key))
    • The item I of node n is stored in section (N+2I-1) successor
    • Lookup is based on the recent principle that the current node does not have a Key to search from the finger table with the hash (Key) distance from the recent node continues this process

Here to distinguish between key successor or node n successor, the same time to pay attention to the recent matching principle.

If the distance between the successor and key in the finger table of node n is recent, it satisfies: key is in the middle of clause I and i+1

Item I is j, section i+1 is P

    • J
    • P>hash (Key)

and

J = n + 2i-1

P = n + 2i

The distance between node N and Key should be in the middle of N and J and P, i.e. J-n<n-hash (key) <p-n

(1) 2i-1<n-hash (Key) <2i

(2) The distance between J and Key is the maximum distance between J and P J-hash (key) <p-j = 2i-1

That is, the distance between J and key is less than the distance between N and key, and the distance is less than half of the distance of N and key, so we guarantee that each iteration, the distance from the key will converge, and at least by 2 of the exponential convergence, that is, binary lookup.

Thus, our theory proves the convergence of chord.

7, in-depth chord algorithm

In fact, the chord algorithm can be completely converted to a mathematical problem:

At random point on the chord ring as node collection, arbitrarily specify node T, from the random node n start based on the chord lookup algorithm can find nodes T.

Why is it so converted? Just to find the direct forward of the key, and even if you find the key, all the problems into a chord ring on the node to find node problem. In this way, the problem immediately becomes very wonderful, if we record the steps of the search as a path, and into a random 2 nodes there is a shortest path, and the chord algorithm is actually constructed such a shortest path, then this path will not exist? No, because the chord itself is a ring, the worst case can be achieved by linear lookups to ensure its convergence.

From the shortest path point of view, chord is only the improvement of the existing linear path, according to this idea, we can completely design the other shortest path algorithm. From the algorithm originally, to ensure that the algorithm convergence or correctness of the premise is that each node to properly maintain its successor node, but in a large peer-network, there will be frequent node increase, exit, assuming no additional work, it is very difficult to ensure that each node has the correct successor.

Chord Redundancy:

Redundancy refers to the existence of useless items in the finger table of chord, where the entries between node N and its successor are meaningless, because the successor represented by these items do not exist. For example, the 1th to 5th item in N1 's finger table does not exist, ancient capital points to N18, and at least 1th to 4th is redundant information.

In general, if the size of the chord ring is 2m, the number of nodes is 2n, if the nodes are evenly distributed on the chord ring, then the finger table of any node N is a redundancy condition: N+2i-1<n + 2m/2n =>2i-1< 2m-n =>i <m-n+1, that is, when I <m-n+1 is redundant.

Redundancy is: (m-n+1)/m=1-(n-1)/m, generally >>n, so chord there will be a lot of redundant information. If there are 1024 nodes on the network, that is, n=10, then the redundancy is: N (10-1)/160≈94%. So many papers point to this and feel that it will result in redundant queries that reduce performance. In fact, since these redundancy information is distributed across multiple node finger tables, assuming that the appropriate routing algorithm is taken, there is no impact on routing calculations.

At this point, we have completely discussed the chord algorithm and its core ideas, the next thing to discuss is the detailed implementation of chord.

Chord Algorithm (principle)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.