Data Shard Consistency Hash

Last Update:2018-09-30 Source: Internet

Author: User

Tags cassandra

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Consistent hash consistency hash is the mapping of data to a single end-to-end hash ring, as well as the mapping of nodes (by IP address or machine name hash) to this ring. For data, the first node that is found clockwise is the storage node of the data, starting at the position of the data on the ring. Here is still the above data for example, assuming that the range of IDs is [0, 1000],n0, N1, N2 position on the ring is 100, 400, 800, then the hash ring and the data distribution as follows:

Can see compared to the above hash method, the consistency hash method needs to maintain the metadata contains the node on the ring position, but this data volume is very small.

Consistent hash in the addition or deletion of nodes, the affected data is relatively limited, such as the addition of a node N3, its position on the ring is 600, so the original N2 responsible for the scope of the paragraph (400, 800] is now N2 (600, 800] N3, So just record R2 (id:759), R3 (id:607) from N2, migrate to N3:

It is not difficult to find that the consistency hash method will only affect the response node on the hash ring when adding and deleting, and no large-scale data migration will occur.

However, the consistency hash method can only allocate the pressure of an existing node when the node is added, and the pressure of the node will be transferred to the next node when one of the nodes is dead. What we want is "one side is difficult, p plus support", so we need to delete the nodes, all the existing nodes can participate in the response, to achieve a new equilibrium state.

Therefore, in the actual project, the concept of virtual node is generally introduced. Instead of mapping a physical node to a hash, the virtual node is mapped to a hash ring. The number of virtual nodes is much larger than the physical node, so a physical node needs to be responsible for the real storage of multiple virtual nodes. When manipulating the data, the corresponding virtual nodes are found through the hash ring, and the corresponding physical nodes are found through the mapping relationship between the virtual node and the physical node.

The meta-data that is required to maintain after the virtual node is introduced will also increase: first, the problem of the virtual node on the hash ring, and the number of virtual nodes is more; second, the mapping relationship between virtual node and physical node. But the benefit is obvious, when a physical node failure is, the hash ring on a number of virtual nodes failure, the corresponding pressure will be divergent to the rest of the virtual node, in fact, more than the rest of the physical node. The same is true when adding physical nodes.

In the project, Dynamo and Cassandra all use the consistent hash algorithm, and the concept of virtual node is used in the higher version. In these systems, the need to consider a comprehensive consideration of data distribution and data copy, when the introduction of a copy of the data, the consistent hash method also needs to be adjusted to participate in Cassandra related documents.

Specific Java implementation: The Real node virtual node hashcode as key into the map and sorted according to the hashcode value, according to the parameters of the hashcode get larger than the Hashcode sub-map collection, the first node of this sub-map set is to hit the node, Get the first node of a large map without taking a child map

Https://www.cnblogs.com/xybaby/p/7076731.html
Http://www.jb51.net/article/124819.htm
String s = "Java", then the computer calculates the hash code first, and then into the corresponding array, the index of the array is calculated from the hash code, and then loaded in the array of containers, such as list. That's the equivalent of dividing the data you want to save into a few big parts, and then saving a lot of values in each section. You query the time to check the big part, and then in the large part of the small, so much faster than the first query

Mongodb

Hashing algorithm:
You can map binary values of any length to a shorter, fixed-length binary value. We'll turn this binary value into a hash value.

Characteristics of the hash value:
* The hash value is a binary value;
* The hash value has a certain uniqueness;
* The hash value is extremely compact;
* To find 2 different inputs that generate the same hash value, it is not possible to do so within a certain time frame.

Hash table:
A hash table is a data organization. The hash table generates the hash value of the keyword based on the keyword (key), and then maps the value of the keyword by the hash value. The hash table stores many
The remainder of the key (which we could have stored only) is a way of using space for time. This "space-for-time" approach is worthwhile in the case of sufficient memory. Hash Table of
Generated, inspired by an array of sources. We know that the array is called the most efficient data structure, because regardless of the size of the array, the time complexity of the query is O (1). If
All keys are non-repeating integers, so this is perfect, no need to add a hash table to do the keyword (key) to the value of the mapping. However, if key is
String, the situation is different. We have to build a hash table to map.
The principle of database indexing is actually the same as a hash table. Database indexing is also a way to use space to change time

Application of hash algorithm of string in string class
@Test
public void Test1 () {
String str = "Qaz";
Char value[] = Str.tochararray ();
int h = 0;
if (Value.length > 0) {
Char val[] = value;

for (int i = 0; i < value.length; i++) {
H = * H + val[i];
}
}
System.out.println (h);
}
The char type can be calculated because Char has a corresponding value in the ASCII character encoding table
System.out.println (' a ' + ' + ' + (0+ ' q ') + "" + (0+ ' a ') + ":" + (' a ' + ' Q '));
A 113 97:210
Take the hash method of the string class in the JDK for example, the string "Gdejicbegh" with the string "HGEBCIJEDG" has the same hashcode () return value-801038016, and they have a reverse relationship. This example shows that the default Hashcode method in the JDK determines whether strings are equal or string palindrome, and there are counter examples.
Because different objects may generate the same hashcode value
The hashcode value of two objects is not equal, it must be two different objects

The elements and principles of the hash weighting algorithm:
As we all know, the multiplication of computers involves shifting computation. When a number is multiplied by 2 o'clock, just take that number and move it to the left one! Select 31 reason is because 31 is a prime number!
The so-called prime number:
Prime numbers are also called primes. Refers to the number of natural numbers greater than 1 that cannot be divisible by other natural numbers except 1 and the integer itself.
When storing the data to calculate the hash address, we want to minimize the same hash address, the so-called "conflict". If you use the same hash address too much data, then this data is composed of a hash chain is longer, thus reducing the query efficiency! So when choosing the coefficients, choose the coefficients as long as possible (31 = 11111[2]) and let the multiplication do not overflow (if you choose a number greater than 11111, it is easy to overflow) factor, because if the calculated hash address is larger, so-called "conflict" the less, the search efficiency will improve.
The multiplication of 31 can be by i*31== (i<<5)-One, now many virtual machines have to do the relevant optimization, use 31 of the reason may be to better assign hash address, and 31 occupy 5bits!
In Java multiplication, if the number is multiplied by the assembly, it causes an overflow problem, which results in data loss.
and 31 is prime (prime number) and not very long, and ultimately it is chosen as the reason for multiplying the coefficients but with this!

The role of the. Hashcode method
For programming languages that contain container types, hashcode is basically involved. As in Java, the Hashcode method is primarily intended to work with hash-based collections, such that the hash set includes HashSet,HashMap and Hashtable.
Why do you say that? Consider the case of how to determine if an object already exists in the collection when it is inserted into the collection. (Note: Duplicate elements in the collection are not allowed to exist)
Perhaps most people would have thought of calling the Equals method to make comparisons on a case by case basis. However, if there are already 10,000 or more data in the collection, if the Equals method is used to compare each other, efficiency must be a problem. At this point the function of the Hashcode method is reflected, when the collection to add a new object, the first call the object's Hashcode method, to get the corresponding hashcode value, In fact, in the implementation of HASHMAP will use a table to save the hashcode value of the object that has been stored in, if the table does not have the hashcode value, it can be stored directly in, no further comparison, if there is a hashcode value, The Equals method that calls it is compared with the new element, the same is not saved, and the other addresses are hashed, so there is a conflict resolution problem, so the number of actual calls to the Equals method is greatly reduced. The Hashcode method in Java is to map the object-related information (such as the object's storage address, the object's field, and so on) into a numeric value, which is called a hash, according to certain rules. The following code is a concrete implementation of the Put method in Java.util.HashMap
The Put method is used to add a new element to the HashMap, and from the concrete implementation of the Put method, the Hashcode method is called to get the hashcode value of the element, and then see if the hashcode value exists in the table. If present, call the Equals method to re-determine if the element exists, update the value if it exists, or add the new element to HashMap. As can be seen here, the Hashcode method exists to reduce the number of calls to the Equals method, thus improving program efficiency

When designing a class, you need to override the Equals method, such as the string class, but be aware that you must override the Hashcode method while overriding the Equals method
For example, design a Peple class Equals method of Return This.name.equals (((people) obj). Name) && this.age== ((people) obj). Age; When you put a people instance as key into the hashmap and then fetch it (new a same name age object), because two instances of hashcode inconsistent, specific reference HashMap get method, If you override the Hashcode method, return Name.hashcode () *37+age, but if the name value is often transformed, do not rely on the field in the Equals method and the Hashcode method
public static void Main (string[] args) {
People p1 = new People ("Jack", 12);
System.out.println (P1.hashcode ());
Hashmap<people, integer> HashMap = new hashmap<people, integer> ();
Hashmap.put (P1, 1);
P1.setage (13);
System.out.println (Hashmap.get (p1));
}
The result of this code output is "null", presumably for the reason that everyone should be clear.
Therefore, when designing the Hashcode method and the Equals method, it is best not to rely on the field in the Equals method and the Hashcode method if the data in the object is variable.

Package Cn.com.gome.gcoin.util;import java.util.sortedmap;import java.util.treemap;/** * @author CYQ * consistency hash get corresponding table */ public class Consistenthashingwithtable {//custom number of tables, original reference Gcoin-commons package constants, but affect spa transfer system, note follow-up maintenance when maintaining uniform private static int Transaction_table_num = 20;//The list of trading tables added to the hash ring is private static string[] transactiontable = new String[transaction_table_ num];static{for (int ci=0;ci<transaction_table_num;ci++) {Transactiontable[ci] = "Tbl_account_transaction" +CI;}} Key represents the hash value of the trade table, value represents the transaction table private static Sortedmap<integer, string> SortedMap = new Treemap<integer, string&    gt; (); The number of virtual nodes, written here to die, in order to demonstrate the need, a true node corresponds to 10 virtual nodes private static final int virtual_nodes = 10; Program initialization, put all the trading tables into the sort trade table AP in static {for (int i = 0; i < transactiontable.length; i++) {int hash = Gethash (Transactionta Ble[i]);         System.out.println ("[" + Transactiontable[i] + "] added to the set, its hash value is" + hash "; Sortedmap.put (hash, transactiontable[i]); Adding virtual nodes, traversing LinkedList using the Foreach Loop efficiency will compare high for (int j=0;j<virtual_nodes;                   J + +) {String Virtualnodename = Transactiontable[i] + "&AMP;&AMP;VN" + string.valueof (j);                   int HASHVN = Gethash (virtualnodename);                   SYSTEM.OUT.PRINTLN ("Virtual node [" + Virtualnodename + "] is added, the hash value is" + HASHVN);               Sortedmap.put (HASHVN, transactiontable[i]); }}//gets the node that should be routed to public static string Getserver (String key) {///Gets the hash value of the key int hash = Gethash (key);//Get all MAPSO greater than the hash value Rtedmap<integer, string> subMap = Sortedmap.tailmap (hash), if (Submap.isempty ()) {//If there is no greater hash value than the key, The first node starts with the integer i = Sortedmap.firstkey ();//returns the corresponding transaction form return Sortedmap.get (i);} else {///The first key is the node nearest to node, which is clockwise, integer i = Submap.firstkey ();//returns the corresponding transaction form return Submap.get (i);}} Use the Fnv1_32_hash algorithm to calculate the hash value of the trade table, where the Hashcode method is not used, and the final effect is not distinguished by the private static int gethash (String str) {final int p = 16777619; int hash = (int) 2166136261l;for (int i = 0; i < str.length (); i++) hash = (hash ^ str.charat (i)) * P;hash + = hash << 13;hash ^= Hash >> 7;hash + = hash << 3;hash ^= hash >> 17;hash + = Hash << 5;//If the calculated value is negative then take it Absolute if (hash < 0) hash = math.abs (hash); return hash;} public static void Main (string[] args) {string[] keys = {"73968928317", "73099946651", "72563328728", "73967405000", "7396 8349990 "," 72112754519 "," 72088646347 "," 74728589363 "," 73955634071 "," 73099946613 "," 72563228728 "," 73967477000 "," 73968649990 "," 72112769519 "," 72088796347 "," 74728333363 "," 73955688071 "};for (int i = 0; i < keys.length; i++) SYSTEM.O Ut.println ("[" + Keys[i] + "] hash value is" + Gethash (Keys[i]) + ", routed to Node [" + Getserver (Keys[i]) + "]"}}

Data Shard Consistency Hash

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More