It is often asked on the forums whether the trie algorithm is used to save the routing table or the hash algorithm. So first of all, I need to understand how much routing table you want to keep. The simple answer is as follows:
Small amount: hash algorithm
Mass: Trie algorithm
However, this answer will be very amateur, really very amateur. But if the answer is more, I am afraid it is not a good thing, the key to see who the question, the purpose of doing very much. So a simple and complete answer must be irrelevant, and only write a journal for later recall.
Take the Linux kernel as an example, if it is a small number of routing table, the performance of the hash algorithm will be very high, a total of 32 hash barrels, each prefix one, all the routing table items hanging in these hash bucket of the conflict linked list, because the routing table entries are not many, so statistically, each conflict linked list length is not too long , the query operation is simple, the only uncertainty is the traversal of the conflict linked list, because the list is very short, so very efficient.
From the organization of Hash routing table, the number of hash barrels is fixed, then inevitably, with the increase of the routing table entries, the length of the list of conflicts will increase, if it is a large number of routing table entries, the performance of the hash algorithm will be degraded. So at this time the advantages of trie algorithm is obvious, the reason I do not want to explain, this early morning, and then go to work, do not want to start a day to discuss the details of pain, but still can be from the point of view of the perceptual. If you look at the core of the hash algorithm as "it plans 32 hash barrels", then the core of the trie algorithm is "It compares the IP address of 32 bits", regardless of the space consumption, it can be in a 32-layer tree after 32 comparison operation to get the answer, However, as I said in my essay, this tree is bound to be compressed and optimized, so the cost may be a bit more backtracking, a small number of traversal operations, but the essence is the same.
Therefore, do not use hash to store a large number of routing tables. How to combine their advantages, it is easy to think of the hash algorithm conflict linked list to trie, so that the meaning can be done in advance to understand the match or mismatch, do not match the words to enter the next hash bucket, the conflict linked list of trie statistics, ahead of the perception of "rapid failure", Unless your route entry prefix is 32-bit itemized, otherwise, there is always a time to fall back to the 24-bit prefix bucket, 16-bit prefix bucket ... Anyway, it's better than simply traversing the hash list. Or conversely, the trie algorithm into the hash operation, all is possible.
The entanglement of the algorithm ends, if your answer to this end, it can be said that you have a deep understanding of the system, but you are bound by the problem itself. If someone asks me what data structure is used to store the routing table? Is the hash algorithm fast or the trie algorithm fast? If I can't answer it! Fastest switching with fast! Then the questioner may feel that my answer does not explain its problem. But I gave a better plan. Next, let's talk about my plan.
High-end routers (if you have not touched cisco/Huawei's router, please bypass), the routing subsystem is generally used as a control plane alone, and the core of the data plane is fast switching. Routing tables are generated through static configuration, dynamic routing protocols, route redirection, Link layer discovery, and so on, and then the table entries of the routing table are injected into the switch board through a series of hardware interfaces, with a hardware exchange table on the entire switch board. Any operation on the switch board is independent of the routing subsystem, and you can consider it as two separate systems. The routing subsystem is controlled by a separate CPU, which is a slow system, and the switching subsystem can even have no CPU, complete on-chip forwarding, and of course, it may still have CPUs that can be used to route and process unconventional control messages and manage messages.
As for the inside of the Exchange board, you can buy a university textbook. It must be understood that the switch board is hardware, its design ideas and software is completely different, many efficient algorithms will be due to cost problems, space loss problems are directly pass. And, most of the switching technology is index positioning technology, not search technology, because the search algorithm relies on a lot of intermediate state, and on the hardware, it is difficult to maintain a stateful system or need a lot of space to maintain state information. We can understand the switching mechanism through CPU cache, and the exchange table can be seen as the cache of the routing table. Both the CPU cache and the exchange table use Tcam memory, a memory that can locate index markers at high speed. Table items are stored in a structured hardware table, indexed by tags, and indexed by various "algorithms" that are obtained through TCAM. The most common is not the tree algorithm, but the hash algorithm, because it is simple enough. Of course, this article, "A schematic of routing item positioning based on the idea of the DXR algorithm," shows another scenario.
OK, little get up, although not finished, also did not write, the new day began.
Fast retrieval problem of massive routing table-hash/trie/fast switching