The Chrod algorithm is one of the four most-peer algorithms, which was introduced by MIT in 2001, and the other three algorithms were:
The purpose of chord is to provide an algorithm that can quickly locate resources in a peer-to network, cord does not care about how the resources are stored, but the acquisition of resources from the algorithm level, so the chord API is simply a set, get.
1. What is chord?
Chord is an algorithm and also a protocol. As an algorithm, chord can strictly prove its correctness and convergence from the point of view of mathematics. As a protocol, chord defines the message type of each link in detail. Of course, one of the main reasons for chord's popularity is that chord is simple enough, and 3000 lines of code are sufficient to achieve a complete chord.
Chord can also be implemented as a consistent hash, distributed hash (DHT).
2. Coverage network (overlaynetwork)
Overlay network refers to a network that is built on top of other networks, network nodes are connected by virtual or logical, such as cloud computing, distributed systems are covered by the network, because it is built on TCP/IP, and nodes are connected. Chord is also built on the overlay network.
3. Structured and unstructured networks
Unstructured network refers to the network nodes do not exist between the relationship between the nodes are completely equivalent, such as the first generation of peer-to-Napster network, such network structure is clear, simple, but the search is not much optimization, often using global or zonal flooding lookup, find a long time, And the result is difficult to guarantee (it may be timed out before it is found).
Structured-to-peer network and unstructured exactly the opposite, we believe that the network has a logically designed structure, such as chord assume that the network is a ring, Kadelima is assumed to be a binary tree, all nodes are tree leaf nodes. With these logical structures, we have introduced more algorithms and ideas to our resource lookup.
4. Distributed hash Table (DHT)
The main idea of DHT is that the access to resources on the network, like Hashtable, can be easily and quickly put, get, the idea was born mainly by the first generation of peer (Napster) network influence. Compared to a consistent hash, DHT emphasizes access to resources regardless of whether the resource is consistent. As with the consistent hash, DHT is only a concept, and the specifics are left to the implementation.
These current-to-peer implementations can be used as a concrete implementation of DHT, again enumerating some representative implementations:
- Chord
- CAN
- Tapestry
- Pastry
- Apache Cassandra
- Kadelima
- P-grid
- BitTorrent DHT
5, Chord realization principle
Chord by mapping node and key to the same space and guaranteeing a consistent hash, to ensure the non-repetition of the hash, chord chooses SHA-1 as the hash function, SHA-1 produces a 2160 space, each of which is a large integer of 16 bytes (160bit). We can assume that the first end of these whole numbers is connected to form a ring called the chord ring. Integers are arranged clockwise on the chord ring, and Node (the IP address and port of the machine) and key (the resource identifier) are hashed to the chord ring, so we assume that the entire peer-to network is in a virtual ring, so we say that chord is a structured peer-network.
Here are a few definitions:
- We call each node on the chord ring a marker.
- If a node is mapped to a marker, continue to call the standard character node
- In the clockwise direction, the front of the node becomes the predecessor (predecessor), after which the node becomes successor (successor); In the same vein, the first predecessor is called Direct Forward, and the first successor is called Direct successor.
The red dot is node, and blue is the marker. The above is just a partial node and a marker, with node N1 as an example to illustrate its successor in the Finger table:
No |
ITH successor |
Successor |
1 |
N1+20 |
N18 |
2 |
N1+21 |
N18 |
3 |
N1+22 |
N18 |
4 |
N1+23 |
N18 |
5 |
N1+24 |
N18 |
6 |
N1+25 |
N45 |
7 |
N1+26 |
N1 |
8 |
N1+27 |
N1 |
Map node and key to a range the feeling is to put the dog and the cat together to measure, although a bit strange, but this can guarantee a consistent hash, the specific can refer to the previous article.
It is clear that the number of nodes distributed on the chord ring is much smaller than the number of glyphs (2160 is an immeasurable astronomical figure), so that node on the chord ring will be very sparsely distributed on the chord ring, which should theoretically be random, but as discussed in the previous consistency hash, if the node count is not many , the distribution is certainly uneven, you can consider increasing the virtual node to increase its balance, if more nodes (such as the large-scale peer network has millions of machines) do not have to introduce virtual nodes.
It is clear that any lookup as long as the results along the chord ring can certainly be found, such time complexity is O (n), N is the number of network nodes, but for a millions of nodes, and nodes often join, exit of the peer-to network, O (N) is intolerable, Therefore chord proposes the following algorithm for nonlinear lookups:
- Each node maintains a finger table with a length of M (M is the number of bits, which is 160 in chord), and item I of the table holds node n (n+2i-1) mod 2m successor (1<=I<=M)
- Each node maintains a list of predecessor and successor that can be quickly positioned for pre-and post-secondary, and can periodically detect pre-and subsequent health conditions
- That is, the successor is stored in multiples of 2, so the modulo is because the last node successor is the beginning of a few nodes, such as the largest node of the next node defined as the first node
- The resource key is stored on the following node: the first node along the chord ring, hash (node) >=hash (key), we call this node the successor of this key
- Given a key, follow the steps below to find out which node its corresponding resource is located in, which is the successor of the key: (If the lookup is on Node N)
- See if the hash of the key falls between the node N and its direct successor, and if it ends, the successor of N is the search
- In the finger table for N, find the successor of the nearest and
- Continue the process until you find the node corresponding to key
Intuitively, the last lookup process should be exponential convergence, similar to the binary search, convergence speed should be very fast, in turn, the lookup time or routing complexity should be logarithmic, we will prove this.
Indicates the process of node N1 finding node N53, or very fast:
6. Proof of chord convergence
For an algorithm, convergence is very important, if there is no convergence to make a guarantee, in the process of a lot of thought is futile. Before we prove it, we emphasize 3 points:
- Key is stored on the successor node of key (satisfies: hash (node) >=hash (key))
- The item I of node n is stored in section (N+2I-1) successor
- The lookup is based on the recent principle that the current node does not have a Key to search from the finger table for the nearest node to hash (Key) to continue the process
Here to distinguish between key successor or node n successor, but also pay attention to the recent matching principle.
If the finger Table of node n is the nearest successor to key, it satisfies: key is in the middle of item I and section i+1
Item I is j, section i+1 is P
and
J = n + 2i-1
P = n + 2i
The distance between node N and Key should be in the middle of N and J and P, i.e. J-n<n-hash (key) <p-n
(1) 2i-1<n-hash (Key) <2i
(2) The distance between J and Key is the maximum distance between J and P J-hash (key) <p-j = 2i-1
That is, the distance between J and key is less than the distance between N and key, and the distance is less than half of the distance of N and key, so we guarantee that each iteration, the distance from the key will converge, and at least 2 of the exponential convergence, that is, binary lookup.
Thus, our theory proves the convergence of chord.
7, in-depth chord algorithm
In fact, the chord algorithm can be completely converted to a mathematical problem:
Node T can be found at any point on the chord ring as the node collection, arbitrarily specifying node T, starting from any node n based on the chord lookup algorithm.
Why is it so converted? As long as you find the direct forward of the key, and even if you find the key, all the problems into a chord ring on the node to find the problem. In this way, the problem immediately becomes very magical, if we take the steps to record as a path, and convert to any of the 2 nodes there is a shortest path, and the chord algorithm is actually constructed such a shortest path, then such a path will not exist? No, because the chord itself is a ring, the worst case can be achieved by linear lookup to ensure its convergence.
From the shortest path point of view, chord only to the existing linear path improvement, according to this idea, we can completely design the other shortest path algorithm. From the algorithm, it is necessary to ensure that the algorithm convergence or correctness of the premise is that each node to properly maintain its successor node, but in a large peer-to-peer network, there will be frequent nodes join, exit, if there is no additional work, it is difficult to ensure that each node has the correct successor.
Chord Redundancy:
Redundancy refers to the existence of useless items in the finger table of chord, where the entries between node N and its successor are meaningless, because the successor represented by these items do not exist. For example, the 1th to 5th item in N1 's finger table does not exist, ancient capital points to N18, at least 1th to 4th is redundant information.
In general, if the size of the chord ring is 2m, the number of nodes is 2n, if the nodes are evenly distributed on the chord ring, then the finger table of any node N is a redundancy condition: N+2i-1<n + 2m/2n =>2i-1<2m-n = I <m-n+1, that is, when I <m-n+1 is redundant.
Redundancy is: (m-n+1)/m=1-(n-1)/m, generally >>n, so chord there will be a lot of redundant information. If there are 1024 nodes on the network, namely n=10, the redundancy is: 1-(10-1)/160≈94%. So many papers point to this and think that it will result in redundant queries that degrade performance. In fact, since these redundancy information is distributed across multiple node finger tables, there is no effect on routing calculations if the appropriate routing algorithm is taken.
At this point, we have completely discussed the chord algorithm and its core ideas, the next is to discuss the implementation of chord.
http://blog.csdn.net/chen77716/article/details/6059575
Chord algorithm (principle)