Can a massive routing table be stored using a hash table?-hash Find and Trie tree lookups

Last Update:2015-06-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Please don't! Many people say so, including me.
The Linux kernel has already removed the hash routing table, and now it's only trie, but I'd like to have some metaphysical discussions on both of these data structures.
1.hash and Trie/radixhash and tire can be unified together. Multiple items with the same hash value have a common feature, how is this feature extracted? This is undoubtedly the work of the hash function. And the trie tree (or radix tree, tube it) of a subtrees tree also has a common feature, how to extract this feature? There is no doubt that this is the parent node of the subtree node that indicates that some bits have the same value for each node in the subtrees tree.
In fact, Trie tree is a special form of hash, its hash function is: take some bits

Trie_hash (value, level) {    return value & level.bits;}

So, then, all the nodes of the subtree should be in a "conflict list" ... the trie tree is "hash again," and the hash function changes to take some bits of level.bits lower. So it seems that hash routing table solves the problem of the large-scale routing item, the scheme that the conflict list becomes longer is the hash again, what does the hash function become? We'll talk later.
2.TCAM Hashtcam is used in many places, it is used to search the index according to the content, often used for routing queries, CPU cache queries, such as CPU cache, the input Tcam content is a memory address, and the output is an index, The cache matching process is to take the index indicated by the cache line, and then compare the input (address) and the cache line indicated that the address is consistent, consistent is hit.
Then the most central process of TCAM is the process of indexing according to the address, the general practice is to hash, because the hard-wired implementation, hash function absolutely can not have too many calculations, so the general practice is "take the address of some bits", such as take 4 to 7 bits altogether 4, will be a 32-bit (32-bit system, The Physical address index cache For example is an example of a slow physical memory address mapped to a 4-bit fast cache index, which forms a pyramid storage structure. 32-bit to 4-bit mappings, the lost 28 bits will form a great possibility of conflict, and this is the time locality and spatial locality to try to compensate, understand that Levi flight should know the great meaning of locality, it constructs our entire human civilization.
The simplest hash function is to take a modulo, in fact, "take some bits", it is more special, it is "take the lowest n bits".
The unified trie Tree of 3.hash and trie trees is actually built from high to low, and its hash function is "take some bits".
4. Dictionary examples-check English and Chinese characters in our primary school when the dictionary is generally divided into a sequencer and radical search, it can reflect the image of hash and trie different. For the sake of simplicity, I take the English word search method and the Chinese radical search method as an example.
English words are strictly one-dimensional order, and only 26 letters, so it can be queried according to the trie tree, such as What,who,where, the first two characters are wh, so they have such a common feature, if the common feature as a hash function, Then querying who,what,who,where in Aaa,cc,sahidad,fwfwew,what,qwert,azsx,who,eee,ooo,where will form a list of conflicts, but one-step operation greatly reduces the number of matches. From 11 to 3, and then further hash, in alphabetical order to know the order of At,wre,o, directly take a third child node. Therefore, the query method of the English dictionary is very simple, is a constant hash positioning process, the hash function is "take some consecutive characters."
We look at the Chinese radical Query method, it is a typical calculation of the hash function of the continuous hash process, such as in Yang, Lin, tree, horse, cow, pig, over, the word "forest" in the word, because the Chinese character is not a one-dimensional structure but two-dimensional structure, it is the composition of strokes, not ordered, so "take The way it was completely invalidated (from which direction to start fetching?) ... How to count a character? ...), so we need to re-structure the hash function, the long-term history of the man has some kind of pictographic significance, through observation, we found that "wood" is a feature, the calculation process, that is, the hash function execution process is our brains to complete, if say "take certain characters" More suitable for hardware implementation, then found that the radicals are more suitable for software implementation, from which we can also analyze the differences between Chinese and western thinking. Continue to say, found "wood next to", Yang, Lin, Tree formed a conflict linked list, but greatly reduced the number of matching candidates, do not want to traverse the words, need to hash again, Xinhua dictionary designed the number of strokes this again hash function, "Lin" word in addition to the radicals left 4 strokes, so positioned to the "forest", If there is a conflict, then it needs to traverse, because the commercial press may not think of any hash function (I do not know who invented the Chinese radical search method, it is a masterpiece of the publishing house ...). In turn, the English search method, can always be definitive positioning, because its constant hash of the hash function is "take consecutive characters", in addition to the word length is limited and one-dimensional permutation sequence, can always be to the last character.
Do you see the difference? See the difference between trie tree query and hash query?
5.hash routing table and Trie routing table for hash routing table queries, the longest prefix match logic is not included in the hash process, it comes from a risky behavior, provided that the hash function is confident enough. Hash routing table lookup directly from the 32-bit prefix hash table, the gradual return to the 0-bit prefix hash table, the expectation in this process can quickly get the first result, the first match result is the final result.
For trie routing table queries, the longest prefix-matching logic is included in the logic of continuous hashing, which matches the last result rather than the first, because the process of "sequential fetching of bits" continues to hash, and the final match is clearly the most accurate. This is the essential difference from the hash routing query. Trie queries have no risky behavior, and it does not take the risk of traversing the super-long list of conflicts, because the process of performing sequential bits always leads the query process to the destination.
6. The situation of the massive routing items Linux used so long the hash routing table was organized because it was enough. Because most of the time, the number of routing table entries is not large. Even if the traversal is not too much overhead, and the hash calculation will greatly reduce the overhead of traversal, so-called risk-taking is the worst case of traversing the entire route, this is not a problem. But once all the routes across the routing table Xiangjin become a big risk, or even traversing half of the time, it is unwise to use hash. This is similar to the game of the lion chasing the antelope, a risk is a meal, a risk is a life, which is strictly asymmetric, so always see the Antelope victory (really can't put this when 0 and game, because Lions sometimes really don't care).
The question now is how to use the hash routing table and reduce the risk. Let's take a look at Linux's own hash function:

Static inline U32 Fn_hash (__be32 key, struct Fn_zone *fz) {    u32 h = Ntohl (key) >> (32-fz->fz_order);    H ^= (h>>20);    H ^= (h>>10);    H ^= (h>>5);    H &= fz_hashmask (FZ);    return h;}

It can be seen that the input of the non-0 hash is sufficient to open, but the nature of the hash is large space to small space mapping, conflict is unavoidable. It was suggested (for example, me) to organize the long conflict list into a trie tree form in a massive routing table entry, but does it make sense? If it is a complete trie routing table, the longest 32 steps (considering compression and backtracking) can find the results, if the use of Hash+trie, each step of the worst result is 32 steps, a total of 32 steps ... There is no point in doing so.
Mass routing table entries, hash small space is strictly a range, it can be considered fixed, the average situation is easily obtained through the address space and hash space, the worst case is a complete traversal. If the average situation is unacceptable, is it worth risking for the best? Therefore, do not use hash table to store the mass routing table entries.
But it's not over yet.
7. Local use and DoS32 bit system, CPU cache is very small compared to memory, how can it bring such a large optimization? All the addresses mapped to the same cache line are conflicting ... This is because the CPU cache takes advantage of the time/space locality of the program, and there is no spatial locality for routing. Time locality can be used to route the cache, but it is difficult for the routing table itself. The difference between a routing table and a CPU cache is that it is complete, there is no problem of being replaced and aged, so a good hash function can be used for a separate routing cache, and the routing table is only used for routing cache misses.
Analysis of the ideal situation, the rest is only sorrow.
Is the time-locality of network access really available? Although a 5-tuple stream will generally go through the router over time, it can cause cache jitter if another stream of hash collisions is passed, which, in the CPU cache, could be solved by controlling task switching or adding the cache line unique key value. However, for network access, you can not prevent the arrival of any one packet, as long as the arrival of the query routing table, it is possible to cause cache jitter. More seriously, the routing cache is susceptible to an attack caused by well-constructed packets, with frequent replacements or unlimited extended lists, which adds to the query overhead.
Therefore, it is more efficient to design a full forwarding instead of using the route cache. Once again, I made an ad for my DXR pro structure.

Can a massive routing table be stored using a hash table?-hash Find and Trie tree lookups

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Can a massive routing table be stored using a hash table?-hash Find and Trie tree lookups

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Can a massive routing table be stored using a hash table?-hash Find and Trie tree lookups

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support