Can massive route tables be stored in HASH tables?-HASH search and TRIE tree search

Source: Internet
Author: User

Can massive route tables be stored in HASH tables?-HASH search and TRIE tree search
Never! Many people say this, including me.
The Linux kernel has long removed the HASH route table, and now only TRIE is left. However, I still want to discuss these two types of data structures in a metaphysical manner.
1. hash and trie/radixhash and tire can actually be unified. Multiple items with the same hash value share one feature. How can this feature be extracted? Undoubtedly, this is the work of the hash function. A subtree of the trie tree (or radix tree) also has a common feature. How can this feature be extracted? Undoubtedly, some bits indicated by the parent node of the Child root node have the same value on each node of the Child tree.
In fact, the trie tree is a special form of hash. Its hash function is: Take some bits

trie_hash(value, level){    return value & level.bits;}

In this case, all the nodes of the subtree should be in a "conflicting linked list... the trie tree is "hash again", and the hash function changes accordingly to the level. bits is lower than some bits. In this case, the hash route table solves the problem of increasing the length of the conflicting linked list in the case of massive route entries. The hash function becomes what? Let's talk about it later.
2. hashTCAM of TCAM is used in many places. It is used to query indexes based on content and is often used for Route query and CPU Cache query. Taking CPU Cache as an example, the input TCAM content is a memory address, and the output result is an index. The cache matching process is to obtain the cache line indicated by the index, and then compare the input content (address) whether the address indicated by the cache line is consistent is hit.
The core process in TCAM is the process of obtaining an index based on the address. The general practice is hash. Due to hard connection implementation, the hash function cannot have too many computations, therefore, the general practice is to "get some bits addresses", for example, take 4 to 7 digits for a total of 4 bits, take a 32-bit (32-bit system, physical address index cache as an example) the slow physical memory address is mapped to a four-bit quick cache index to form a pyramid storage structure. For ing between 32-bit and 4-bit, the 28-bit missing will form a highly probable conflict, and this is the time locality and space locality to make up for it, what we should understand is that we should know the great meaning of locality, which builds our entire human civilization.
The simplest hash function is the modulo operation, which is actually "getting some bits". It is more special, it is "getting the lowest N bits ".
3. The unified trie tree of hash and trie is actually built from high to low. Its hash function is "Take some bits".
4. Dictionary Lookup example-when we look up English and Chinese characters in elementary school, the dictionary is generally divided into audio order lookup method and department head lookup method, which can reflect the differences between hash and trie. For ease of use, I use the English word lookup method and the Chinese Character Radical lookup method as an example.
English words are strictly ordered by one dimension and contain only 26 letters. Therefore, they can be queried by trie tree. For example, what, who, where, and the first two characters are wh, therefore, they share such a common feature. If this common feature is used as a hash function, then in aaa, cc, sahidad, fwfwew, what, qwert, azsx, who, eee, when you query who, what, who, and where in ooo and where, a conflicting linked list is formed. However, the number of matches is greatly reduced by one-step operation, from 11 to 3, and further hash is performed, we can see in the alphabetical order that at, wre, and o are taken directly from the third child node. Therefore, the English dictionary query method is very simple. It is a process of constant hash positioning, and the hash function is "Take some consecutive characters".
Let's take a look at the Chinese character department head query method. It is a typical computing hash function's continuous hash process, such as Yang, Lin, tree, horse, ox, pig, Guo, look up the "Forest" character in the text above. Because the Chinese character is not a one-dimensional structure but a two-dimensional structure, it consists of strokes, not sorting, therefore, the method of "getting some characters" is completely invalid (in which direction does it start ?... How to calculate a character ?...), Therefore, we need to re-construct the hash function. The long-history man has a certain pictogram significance. Through observation, we find that "Wood" is a feature beside it. This computation process, that is, the execution process of hash functions is completed by our brain. If "Take some characters" is more suitable for hardware implementation, it is found that the radicals are more suitable for software implementation, we can also analyze the differences between Chinese and Western thinking. Continue to the next step, after discovering the "Wooden character", Yang, Lin and the tree form a conflicting linked list, but it greatly reduces the number of matching candidate words. If you don't want to traverse the tree, you need to hash it again, xinhua Dictionary designed the number of strokes and then hash function. Besides the four strokes left in the text "Lin", It located the "Lin". If there are still conflicts, it needs to be traversed, because the business press may not be able to think of any hash function (I don't know who invented this Chinese character checking method, It's the masterpiece of the press ...). In turn, looking at the English query method, it can always final deterministic positioning, because its continuous hash function is "Take consecutive characters", and the word length is limited and one-dimensional order is progressive, it can always be the last character.
Do you see the difference? Are there any differences between trie tree query and hash query?
5. hash route table and trie route table for hash route table query, the longest prefix matching logic is not included in the hash process. It comes from a risky behavior, provided that the hash function is confident enough. The hash route table query starts directly from the 32-bit prefix hash table and gradually returns to the 0-bit prefix hash table. We hope to get the first result quickly in this process, the first matching result is the final result.
For trie route table queries, the longest prefix matching logic is included in the logic of continuous re-hash. It matches the last result rather than the first one, because the process of "sequential fetch of some bits" continuous hash, the final matching is obviously the most accurate. This is essentially different from hash route query. Trie queries do not take risks. They do not need to traverse the ultra-long conflicting linked list, because the process of retrieving bits from the field can always lead the Query Process to the destination.
6. In the case of massive route entries, Linux uses hash route table organization for so long because it is sufficient. Because most of the time, the number of route table items is small. Even traversal does not have much overhead, while hash computing greatly reduces the overhead of traversal. The so-called adventure is the worst case of traversing the entire route item. However, it is wise to use hash once all the route entries in the entire route table are traversed, or even half of them are too much to be traversed. This is similar to the game of a lion chasing the antelope. One risk is a meal, and the other risk is a life. This is strictly asymmetrical, so it is always possible to see the victory of the antelope. (You really cannot regard this game as a zero-sum game, because lions sometimes really don't care ).
The problem is how to use hash route tables and reduce risks. Let's take a look at Linux's own hash functions:
static inline u32 fn_hash(__be32 key, struct fn_zone *fz){    u32 h = ntohl(key)>>(32 - fz->fz_order);    h ^= (h>>20);    h ^= (h>>10);    h ^= (h>>5);    h &= FZ_HASHMASK(fz);    return h;}

It can be seen that the non-zero hash of input is sufficient, but the essence of hash is that large space is mapped to small space, and conflicts are inevitable. Someone suggested (for example, I) to organize a long conflicting linked list into a trie tree when a large number of route table items, but does this make sense? If it is a complete trie route table, the result can be found in up to 32 steps (considering compression and backtracking). If hash + trie is used, the worst result of each step is 32 steps, A total of 32 steps... this is meaningless.
When there are a large number of route table items, the hash small space has a strict range and can be regarded as fixed. The average situation is easily obtained through the address space and hash space, and the worst case is full traversal. If the average condition is unacceptable, is it worthwhile to take risks for the best case? Therefore, do not use hash tables to store massive route table items.
However
7. Local utilization and the DoS32-bit system, the CPU Cache is very small than the memory. How can we bring such a large optimization? All the addresses mapped to the same cache line conflict with each other... this is because the CPU Cache uses the time/space locality of the program, but there is no space locality for routing. Time locality can be used for routing cache, but it is difficult to use it for routing tables. The difference between the route table and the CPU Cache is that it is completely complete and there is no replacement or aging problem. Therefore, you can use the hash function for separate route cache, the route table is only used for matching when the route cache does not hit.
After analyzing the ideal situation, the rest is sad.
Can I use the time locality of network access? Although a 5-tuple data stream generally goes through the router over time, if another data stream in the hash conflict also goes through, it will cause cache jitter. In the view of CPU Cache, this problem can be solved by controlling task switching or adding a unique key value of the cache line. However, you cannot prevent the arrival of any data packet for network access. When it comes, you must query the route table, it may cause cache jitter. More seriously, the routing cache is vulnerable to attacks by specially crafted data packets, resulting in unavailability, frequent replacement, or infinite lengthen of the linked list, thus increasing the query overhead.
Therefore, designing a complete forwarding table is more efficient than using the routing cache. This once again made an advertisement for my DxR Pro structure.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.