The existing Peer-to-peer implementations can be divided into three different types. They are: Peer-to-peer based on directory servers, unstructured peer-to-peer and structured peer-to-peer. Based on the directory server This kind of system sets up the directory server, is used to save the user node the address information and the node shared file description information, the file itself is disperses the storage on each node, the actual file transmission also is between the peer node, the directory server only has the intermediary function, Provides the publishing and querying file Indexing Service for the node. Since the centralized directory server may not only be a bottleneck of the system, but also may cause legal disputes, the unstructured peer-to-peer system represented by Gnutella has emerged, in which the file index information is no longer stored and managed by a centralized directory server, but distributed to the network. Save by the node itself, this kind of system uses the distributed index lookup strategy, in order to find the files in the network, the nodes should randomly maintain some other nodes in the network as neighbors so as to broadcast the query message through neighbor nodes. Because there is no directory server in unstructured Peer-to-peer system, there is no single point of bottleneck, there is no single fault point. However, its disadvantage is obvious: Broadcast query messages in the network aggravate the burden of network communication, and its query mechanism is not scalable when the scale of the system is enlarged. In addition, because the query message is limited to a certain range, so it is not guaranteed that the network can be found in the destination data. The two types of peer-to-peer systems described above lack an effective and extensible index lookup mechanism. Therefore, in recent years, many research groups have done a lot of research work on the design of extensible search mechanism, and put forward the Distributed hash table system (distributed hash TABLE,DHT), which is used to construct the structured peer-to-peer chord, pastry, can and tapestry. The main ideas of DHT are: first, each file index is represented as a (k, V) pair, which is called a keyword, can be a hash of the file name (or other description of the file), and V is the IP address of the node that actually stores the file (or other descriptive information for the node). All the file index entries (that is, all (K, V) pairs) constitute a large file index hash table, as long as you enter the target file's K value, you can find out from this table all store the file node address. Then, the above large file hash table is divided into many local small blocks, according to the specific rules of these small chunks of the local hash table distributed to all participating nodes in the system, so that each node responsible for maintaining one of the pieces. In this way, when a node queries a file, it is only possible to route the query message to the appropriate node (the hash table blocks maintained by the node contain the (k,v) pairs). There is a very important problem, that is, the node should follow a certain rule to split the whole hash table, and then determine the node to maintain a specific neighbor node, so that routing can be carried out smoothly. This rule varies from one system to another, can,chOrd,pastry and Tapestry have their own rules, and they have different characteristics. Distributed search and routing algorithm based on distributed hash table (DHT) is becoming a hotspot in the research and application of structured peer-to-peer networks because of its advantages of finding certainty, simplicity and distribution. Since 2002, the United States National Science Foundation (NSF) has provided 12 million of billions of dollars to launch a 5-year research project, Iris, which focuses on the powerful scientific research efforts of 5 prestigious universities such as MIT and UC Berkeley, Develop a new type of infrastructure based on DHT for the next generation of large-scale distributed applications. Distributed hash table can show good robustness in the presence of node failure, attack and sudden high load. It has good scalability, can obtain the larger system scale with the lower system cost, can configure itself, can automatically merge new joining nodes into the system without manual intervention, and can provide a simple and flexible interface. Can be used at the same time for multiple peer-to-peer applications. 1 Chord
Chord is a distributed lookup algorithm jointly proposed by UC Berkeley and MIT to find data in peer-to-peer networks. Given a keyword, chord can effectively map the keyword to a node in the network. Thus, in a peer-to-peer network, as long as each data V is given a keyword K, you can use chord to store or extract the corresponding (k, V) pairs on the node that the keyword maps. The salient feature of chord is that the algorithm is simple, and the communication cost of the extensible-query process and the State of node maintenance increase exponentially with the total number of nodes in the system. The routing performance of chord is better than can, and the node joining process and maintenance cost are superior to tapestry and pastry.
Chord design: Each keyword and node in chord has a single m-bit identifier. The keyword identifier K is obtained by the hash keyword itself, and the node identifier N is obtained by the IP address of the hash node. Hash functions can be selected SHA-1. All nodes, from small to large (after modulo 2m), are arranged in a clockwise direction on a logical identification ring (called a chord ring), according to their node identifier. The mapping rule for chord is that the key is labeled K (k, V) on a node that is stored on such nodes, the node identifier equals K or is immediately after K on the chord ring, and this node is called the successor node of K, expressed as successor (K). Since identifiers are represented by M-bit binary numbers and are arranged in a circle from 0 to 2 m-1, succesor (k) is the nearest node in the clockwise direction from K.
The above figure gives a m=6 chord ring, with 10 nodes distributed in the loop, 5 keywords stored, and n before the node identifier plus k before the keyword. Because successor (10) = 14, the keyword 10 is stored on node 14. Similarly, keywords 24 and 30 are stored on node 32, keyword 38 is stored on node 38, and keyword 54 is stored on node 56. The mapping rules above are still set up when the participating nodes in the network are changing. To do this, when a node n joins the network, some of the key words that were originally assigned to N are assigned to N. When node n leaves the network, all the keywords assigned to it are reassigned to the successor node of N. In addition, there will be no other changes in the network. For example, when a node identified as 26 is connected, the key identified as 24 for the node identified as 32 is transferred to the new node for storage. Obviously, in order to be able to forward the query message in the system, each node should understand and maintain the identity and IP address of the neighboring nodes on the chord ring, and use this information to form its own routing table. With this watch, the chord can be found on the ring at any two points between the road.
Chord Routing: Each node in chord can complete a simple query process by maintaining the identity and IP address of its successor node on the loop. A query message for a particular keyword can be passed through the ring through the successor node pointer until it reaches a node where the identifier of the keyword falls between the node identifier and its successor node identifier, where the successor node is the node that stores the target (K, V) pairs.
The figure above shows an example of the request of node 8 to find the keyword 54, which is passed sequentially through the successor node, and finally to the node 56 that stores the keyword 54. In this simple query, each node needs to maintain less state information, but the query speed is too slow. If there are N nodes in the network, the cost of the query is O (n) Order of magnitude. So when the network is very large, such speed is not acceptable.
To speed up the query, chord uses an extended query algorithm. To do this, each node needs to maintain a routing table called a pointer table. If the keyword and node identifiers are represented by M-bit bits, then the pointer table contains up to M-table entries. The first node in the pointer table of node n is the one on the ring that identifies greater than or equal to n+2i-1 (the comparison is modeled in 2m). For example, if S=successor (n+2i-1), 1≤i≤m, then the node S is said to be the first pointer of node n, recorded as N.finger[i]. N.FINGER[1] is the successor node of node N. Each entry in the pointer table contains both the identity of the related node and the IP address (and port number) of that node.
The graph above shows the pointer table of node 8, for example, node 14 is the first node on the ring immediately after the (8+20) mod 26=9, so the first pointer to node 8 is node 14, and likewise because node 42 is the first node immediately after the (8+25) mod 26=40 on the ring, So the 6th pointer to node 8 is node 42. Maintaining the pointer table allows each node to only know the information of a small subset of nodes in the network, and the closer it is to the node, the more information it knows. However, for any one keyword K, the node is usually unable to determine the successor node of K based on its own pointer table. For example, node 8 in the following figure cannot determine the successor node of keyword 34 because 34 of the subsequent nodes on the loop are 38, and node 38 does not appear in the pointer table of node 8. The extended query process is that when any node receives a request for a query keyword K, it first checks whether the K falls between the node identifier and its successor node identifier, and if so, the successor node is the node that stores the target (k, V) pairs. Otherwise, the node finds its pointer table, finds the first node in the table with the node identifier maximum but not more than K, and forwards the query request to the node. By repeating this process, you can eventually navigate to the successor node of K, which stores the node with the target (K, V) pairs.
Node joins and exits: to respond to changes in the system, each node periodically runs a probe protocol to detect new or failed nodes, thereby updating its own pointer table and pointers to subsequent nodes. When the new node n joins, it initializes its own pointer table through the existing nodes in the system. That is, the new node n will require a node in a known system to find each table entry in the pointer table for it. After the probe protocol is run on the other nodes, the new node n is reflected in the pointer table and subsequent node pointers of the related node. At this point, the successor node of some of the keywords in the system becomes the new node n, so the previous successor node will transfer this part of the keyword to the new node. When node n is invalidated, all nodes that contain n in the pointer table must replace it with the successor node of N. In order to ensure that the failure of node n does not affect the ongoing query process in the system, each chord node maintains a successor list that includes R's most recent successor node. If a node notices that its successor node is invalidated, it replaces the failed node with the first normal node in its successor list. 2 pastry The pastry is a distributed lookup and routing system for wide-area Peer-to-peer applications, jointly presented by the Microsoft Institute and Rice University. Each node in the pastry system has a unique node number (NODEID), and each message has a keyword. Pastry can route messages to the node where the Nodeid and keywords are closest to the value. Each pastry node maintains information about the neighbor nodes in the node number space and directly adjacent to it. When a new node joins, an existing node fails, or a recovery event occurs, the pastry node notifies the upper application. The pastry is fully distributed, extensible, and self-organizing, and it can automatically respond to node joins, departures, and failures.
Pastry design: Pastry is a self-organizing overlapping network where each node is assigned a 128-bit nodeid. Nodeid is used to identify the location of a node in a circular node space (from 0 to 2128-1), which is randomly allocated when the node joins the system, and the result of the random allocation is that all the nodeid are evenly distributed in the 128-bit space of the node number. Nodeid can be obtained by calculating the hash function value of the node's public key or IP address. Assuming that the network contains n nodes, pastry can route a given keyword to the nearest node of Nodeid and the keyword. Even if node failure occurs at the same time, pastry can guarantee that the keyword is delivered to the target node unless the Nodeid and keyword are near the node | L|/2 a simultaneous expiration (| l| are configuration parameters, typically 16 or 32. For routing purposes, pastry Nodeid and keywords as a string of 2 B-based numbers, and query messages are routed to the nearest node of the Nodeid and keywords. The method is that each node forwards the query message to the next node, ensuring that the same prefix of the node's Nodeid and keywords is at least one digit (i.e., b bits) longer than the same prefix of the current node's Nodeid and keywords. If no such neighbor node is found, the message will be forwarded to the node with the same prefix length but the node number value closer to the keyword. To do this, each pastry node needs to maintain a state table: A routing table, a neighbor node set, and a set of leaf nodes.
Nodeid for 10233102 pastry node maintenance status sketch
The figure above shows a data schematic of a node maintenance, with B taking a value of 2 and all numbers being 4. The top row of the routing table is line No. 0. The shadow entries for each row in the routing table represent the corresponding digits in the current node number. The Nodeid representation format for each node in the routing table is "same prefix + next digit + Nodeid remaining bit". The IP address of the related node is not listed in the figure. The routing table contains 2 b-1 entries per line. The first n digits of each node Nodeid are the same as the first n digits of the current node Nodeid in the 2 b-1 table entries of the nth row, and the number n+1 digits and the current node are different. The value of B is the tradeoff between the size of the routing table and the maximum number of route hops required between any two nodes. For example, when B takes 4 and the network has 10 6 nodes, the routing table for each node includes an average of 75 table entries, and the expected route hop count is 5. If there are 10 9 nodes in the network, the routing table will have an average of 105, and the expected route number will increase to 7. The leaf node set maintains the closest node to Nodeid and this node, half of which is nodeid greater than the current node, and the other half is nodeid less than the current node. The leaf node set needs to be used when routing. The Neighbor node set maintains the node closest to the node by the given metrics, and the normal routing process does not use a neighbor node set, its primary role is to maintain the local nature of the route. Typically, the sizes of these two sets are 2 B or 2x2 B respectively. The routing process for pastry is that when a node receives a query message, it first checks whether the keyword in the message falls within the leaf node set. If so, the message is forwarded directly to the corresponding node, that is, the node in the leaf node set Nodeid and the nearest keyword. If the keyword does not fall within the leaf node set, the node forwards the message to a node in the routing table whose Nodeid and keywords have at least one digit longer than the same prefix of the current node's Nodeid and keywords. If the corresponding table entry in the routing table is empty, or if the corresponding node in the table entry is unreachable, the query message is forwarded to the node with the same prefix length but with the node number value closer to the keyword. Such a node must be in a leaf node set unless the message has reached the destination node. Moreover, as long as more than half of the nodes in the leaf node fail at the same time, we can find the node that satisfies the requirement. Obviously, each step of the route is a step closer to the target node than the previous step, so the routing process always converges. node join and exit
When the new node joins, it needs to initialize its own state table and notify the other nodes that they have joined the system. Assume that the Nodeid of the new join node is x, and that X knows that the system is in a similar distance to its own node A before joining the pastry. The new node x first asks a to route a "join" message, and the keyword of the message is X. This message eventually arrives at the nearest node Z of Nodeid and X. As a response, Node A, node z, and all passing nodes on a path from A to Z will send their state table to node X. Node x uses this information to initialize its own state table, and then node X notifies other nodes that it has joined the system. The complexity of the node join operation is O (log2bn), from the number of messages exchanged. Nodes in the pastry are likely to fail or suddenly leave the system. If an adjacent node in the Nodeid space cannot communicate with a node, it is assumed that the node is invalid. Once the node detects that a node in its leaf node-set l is invalidated, it requests that the Nodeid, the largest or smallest node in the set, send its leaf node set L ' over. (If the Nodeid of the failed node is larger than the current node's Nodeid, the Nodeid is the largest node in the leaf, whereas the Nodeid is the smallest node.) The current node will select an active node from L that does not have an L in place of the failed node. If a node detects that a node in its routing table is invalidated, it will select another node from the routing table row in which the item is located, requiring that node to send an entry for the corresponding location in the routing table. If the corresponding row in the current node's routing table does not already have a node available, the current node selects a node from the next row in the routing table, and the process continues until the current node is able to get a replacement node number, or the current node traverses the routing table. The nodes also periodically exchange information with the nodes in the neighbor's node to detect whether the nodes are still in the pastry system. If a node detects that a node in its neighbor node is invalidated, it requests the other neighbor node to send its neighbor node set and select a new neighbor node to replace the failed node. 3 can UC Berkeley can (content addressable network) to achieve the file index and storage location of the effective mapping, do not need any form of central control point, Nodes only need to maintain a small amount of control state and the number of States is independent of the number of nodes in the system, has a fully self-organizing and distributed structure, and has good scalability and fault tolerance. can design can design based on virtual D-dimensional Cartesian coordinate space, this coordinate space is completely logical, and any physical coordinate system is not related. At any time, the entire coordinate space is dynamically assigned to all nodes in the system, eachThe node is responsible for maintaining separate, disjoint areas. The nodes in can are self-organizing into an overlapping network (overlay network) that represents the virtual coordinate space. Each node understands and maintains the IP address of the nodes in the adjacent area and uses the neighbor information to form its own coordinate routing table. With this table, can find a way between any two points in the coordinate space.
The following figure gives a 2-d [0, 1]x[0, 1] Cartesian coordinate space divided into five node regions. The virtual coordinate space uses the following method to save (K, V) pairs. When saving (K1, V1), a unified hash function is used to map the keyword K1 to point p in the coordinate space. The value is then saved in the node in the area where the point is located. When you need to query the value of the keyword K1, any node can use the same hash function to find the K1 corresponding point p, and then remove the corresponding value V1 from the node corresponding to the point. If this node is not the node that originated the query request, can will be responsible for forwarding this query request to the node in the area where P resides. Therefore, the effective routing mechanism is a key problem in can.
Can routing of can virtual plane maintained by five nodes
The routing in Can is simple, along a path from the point of origin of the request to the destination point in the coordinate space. To do this, each can node holds a coordinate routing table, which includes the IP address of its neighbor node and the virtual coordinate area it maintains. Two nodes are neighbors: in D-dimensional coordinate space, the regions maintained by the two nodes overlap on the coordinates of the d-1 dimension and are adjacent to each other in the remaining one-dimensional coordinates. For example, in Figure 2.6, D and e are contiguous nodes, and D and B are not contiguous, because D and B are contiguous on both the x-axis and the y-axis. Each can message includes the destination point coordinates. The point of routing is to forward the message to its neighbor node in the direction of the target node. The following figure gives a simple example of the lookup process.
If a D-dimensional space is divided into n equal areas, then the average route length is (D/4) (n 1/d), and each node needs only to maintain the information of the 2d neighbor node. This result shows that the scalability of can is very good, the state information maintained by each node is unchanged while the number of nodes increases, while the route length is only increased in order of O (n 1/d). Because there are many different paths between two points in the coordinate space, the failure of a single node can have little effect on the can basically. When a failed node is encountered, can automatically route along other paths. Node joins and exits
Because the entire can space is allocated to all existing nodes in the system, when a new node joins the network, it must get its own space of coordinates. Can implement this process by segmenting existing node regions. It splits the area of an existing node into two blocks of the same size, preserving one of them and the other to the newly joined node. The whole process is divided into the following three steps:
1. The new node first finds a node that is already in the can.
2. The new node uses the can routing mechanism to find a node where the area will be segmented.
3. Perform the partition operation, and then the adjacent area of the original area must be told that a split occurs so that the new node can be routed to other nodes.
When a node leaves can, it must ensure that its area is taken over by the remaining nodes in the system, and that it is assigned to other nodes that are still in the system. Typically, a neighbor node takes over the area and all the index data (k,v) pairs. If the zone responsible for a neighbor node can merge into a large area with the zone responsible for leaving the node, the merge operation will be performed by the neighbor node. Otherwise, the zone is assigned to the smallest node in the area of the neighbor's node. In other words, this node will temporarily be responsible for two areas. Normally, each node sends periodic update messages to all its neighbor nodes, including its own zone range, its list of neighbors, and the zone area that these neighbor nodes are responsible for. If a neighbor's update message is not received more than once, the node considers the neighbor invalid. At this point, the node will start the takeover mechanism and start a clock. Each neighbor node of the failed node executes the process independently of each other, and each clock size is proportional to the area responsible for the corresponding node. If the clock times out, the node sends a takeover message to all the neighbor nodes of the failed node, including its own area area information. When a node receives a takeover message, it cancels the takeover operation if its area is larger than the node that emits the message. Otherwise it will send its own replacement message. This mechanism can effectively select the neighbor node with the least area to take over the failed node. In special cases, it is possible to have multiple adjacent nodes fail at the same time. For example, a node detects a node failure, but more than half of the failed node's neighbors may not be able to reach it. At this point, if you let the node take over the area of the failed node, it may cause inconsistent state in the can. So in this case, the can, before performing the repair operation, searches for nodes near the failed area, and the search expands until sufficient neighbor state is secured to begin the takeover process safely. 4 Tapestry Tapestry is a new Peer-to-peer network location and routing algorithm proposed by UC Berkeley. This algorithm can carry out the location-independent route to the message and pass the query message to the nearest node that stores the copy of the target object. Tapestry has the characteristics of self-organization, fault tolerance and load balancing. Each tapestry node only needs to maintain the O (log n) size of the routing table information, and the route is completed at most O (log n) hops. tapestry Design tapestry Randomly assigns a node identifier Nodeid to each node in an identifier space, and the object assigns a globally unique identifier GUID from the same identifier space (globally unique Identifier). Tapestry uses SHA-1 to produce identifiers that allow Nodeid and GUIDs to be evenly distributed across the labelThe character space. In order to discuss the convenience of the problem, the N ID is used to represent the identifier of the node n, and O G is used to represent the identifier of the object o. Tapestry currently uses a 160-bit identifier space, the identifier is represented by a globally unified representation (for example, with 16, the identifier is a 40-digit number), and all nodes are self-organizing into an overlapping network based on identifiers. Tapestry dynamically maps each identifier G to a node in the current system, called the root node of G, which is represented as G R. If a node is n id=g, the node is the root node of G. In order to forward query messages, each node needs to maintain a neighbor mapping table, each containing the identifier and IP address of a neighbor node. When routing to G r, messages are forwarded along the neighbor pointer to node identifiers closer to G in the identifier space (for example, matching larger prefixes).
Each node in the tapestry has a neighbor mapping table saved. The Neighbor mapping table can be used to pass messages forward one by one to the destination address, for example, from the 4***=>42**98=>42a*=> destination node 42AD (here * indicates wildcard characters). This approach is similar to the longest prefix match during IP packet forwarding. The Neighbor Mapping table for node n is divided into levels, each of which contains a number of neighbor nodes equal to the cardinality of the identifier notation, while the neighbor identifier and the same prefix for this node identifier in each level are more than one digit above the previous level. In other words, part I of the Level J Neighbor table is the identifier with prefix (N, j-1) + "I" as the prefix and the nearest neighbor node from the current node. For example, the 4th level 9th item in the Neighbor Map table of node 325AE is a node in the system with a prefix of 325 + "9" =3259.
The image above shows a neighbor pointer instance of a node, from which you can see that the first-level neighbor node identity and the node identity have no common prefix, while the second-level neighbor node identity begins with 4, which is prefixed with the same digit as the identity of this node.
The basic lookup and routing mechanism used by tapestry: When a lookup message arrives at the nth node in the delivery process, the common prefix length of the node and destination node is at least greater than N. For forwarding, the node looks for a neighbor node that matches the next digit of the n+1 level and destination identifier for the Neighbor map table. The forwarding process is performed sequentially in each node until the destination node is reached. This method guarantees that the route can reach the destination node at most LOGBN nodes, where n is the size of the node identifier namespace, and B is the cardinality used by the identifier. Similarly, because each level of the Neighbor Map table of each node only needs to save B-table entries, the space of the Neighbor map table is blogbn.
The above figure gives an example of a query message forwarding in tapestry. The cardinality of the node identifier in the figure is 4, the query message is emitted from 5230, and the destination node is 42AD.
A node in tapestry is called a server when data is shared, and is called a client when the data is requested, and is called a router when the message is forwarded. That is, each node can have client, server, and router capabilities at the same time.
Server S reports an object o that S is saved by sending a message to the root node of object O (Guid og) or periodically. Each node on this publish path holds the position information about the object o pointer <og, S>, where the location information is just a pointer to S, not a copy of Object o. When multiple servers with the same object copy publish messages to the root node, each node in the path saves the list of location pointers in the order that each server increments from its network delay.
When you need to locate an object o, the client sends a query message to the root node of object o, and each node on the query message forwarding path checks itself for the position pointer of the object o, and if so, the node forwards the query message directly to the server S, otherwise the message arrives at the root node of O. The root node then forwards the query message to the server. One important feature of tapestry is that query messages can be forwarded to a server that has a copy of the objects closest to the customer. In tapestry, when you send two messages from two adjacent nodes to the same destination node o r, their forwarding paths will soon cross, because each step in the routing process makes the next node identifier and O r have a longer identical prefix, and the path to the root node is only a function of the destination identifier, Instead of a function that initiates the node identifier of the message, the next hop neighbor node in the routing process is chosen based on the distance from the network, so the closer the client is to the server, the faster the query path encounters the publishing path. In other words, the query message is forwarded to the nearest server.
node Join and exit: Tapestry node join algorithm is similar to pastry. Node n before joining the tapestry network, you need to know a node g that is already in the network. Then n sends a request to route its own node ID through G, constructing its own neighbor table based on the corresponding neighbor table of nodes. Some optimization work is needed in the construction process. After you construct your own data structure, node n notifies other nodes in the network that they have joined the network. Notifications are made only for the primary neighbor node and the level two neighbor node in the Neighbor map table in N. Tapestry uses two mechanisms to handle the exit of the node. One scenario is that the node disappears from the network (primarily because of a node failure), in which case its neighbors can detect that it has exited the network and can adjust the routing table accordingly. Another mechanism is that the node passes through all of its neighbors ' nodes before exiting the system, and the nodes adjust the routing table accordingly and notify the Object server that the node has exited the network. To detect links and server failures during normal operation, you can use the TCP connection timeout mechanism. In addition, each tapestry node uses the back-to-send "heartbeat" (Heartbeats) UDP, which is periodically sent to the node that joins the Neighbor mapping table. Each node can determine whether a node in its neighbor map is invalidated according to the heartbeat groupings it receives. In the Neighbor node table, in addition to the primary neighbor node (nearest neighbor), each route item also holds two backup neighbor nodes, and when the primary neighbor node fails, the Neighbor node table selects the Backup neighbor node sequentially.