This is a thesis composition. Recently always wanted to write something, but have been unable to find the topic, just received an e-mail, someone asked me the layout of the Linux routing table problem and the problem of routing cache, in addition to help people do a few days ago a piece of road by the table, so think this is a good topic, simply spent most of the weekend time, my dissertation.
The previous cliché does not write the proposition composition has been 11 years, the last time is in the examination room of the entrance exam. Received the mail, was asked to write this proposition composition, in fact, I was refused, because you can not ask me to write I immediately to write, first of all I have to understand this, I can not say that the time to paste a lot of Baidu out of the picture, said a lot of cliches, people see that I was reproduced or translated, And no one gave me a grade and no reward, better not to write ... I have to take a look at the idea, found that these things is my heart, that thought after a morning, I went to jiading New world to eat a small hotpot, drink a little wine, and then come back to organize pictures, at least these things I am very familiar with, now every day as long as there is time I also toss these, not only that, I also once Friends toss these, just like they're tempted to study history ...
In fact, when these words and figures are sorted out, I am gratified. I remember the last time I organized this kind of diagram in 2011, when I was a little born, I was in the hospital at night, shift home to rest during the day, and then write essays or something, many of them are not published on the blog, because many are not related to it. But anyway, I have made some progress in the past few years, although in some places I have regressed a lot.
Recently everyone is busy job-hopping, looking for a job, please remember that the first question is very important, really. The pattern must be big, the wording must be professional ... But these are virtual, you can answer this question with play, or need some accumulation. I dare not say that I am an expert in the field of the network, almost a rookie, a little accumulation of the veteran, so I wrote this composition, to see if they can hope to feed on the bottom IP network of small partners to bring a little help. According to many Danale's predictions and my own feelings, IP will have a big move in recent years, not the IPV6, but other, and the internet of things, cloud, opportunity Network, self-organization related. So, squat good horse can go hand-held slingshot quasi-dozen mm butt AH.
The entire address space of the 1.ipv4 address space tree IPv4 can form a perfect two-fork tree, because it completely fills the entire 4G address space. This tree is shown below:
It should be noted that it is impossible to draw this picture completely, if the diameter of a node is as small as 1mm (which means that you have to take a magnifying glass to see the information stored in the small circle [I am not writing any information in the circle, I am afraid they are lossy compression ...) Simulation case with a magnifying glass can be seen, digital pictures once lossy compression, take a magnifying glass to see is a block, scientific name block the progress of mosaic-Greek/Roman heritage]), then the lowest level will be 4000km long, just northeast Heilongjiang Mohe to Tibet Shigatse distance, if you read the whole picture, The equivalent of a trip to China ... So I can only draw part, because it is impossible to show the whole picture, so I do not need to shrink the node diameter to 1mm.
Through this diagram, you will find that given an IP address, you can start from the root, a bit more travel, and ultimately find its place at the bottom, the process is so fast, only 32 steps, so that you can not imagine Mohe to Shigatse distance, but if you have heard the paper folded to the moon, An estimate is thoroughly understood, which is the charm and danger of the index. Put the picture in your head and add some information in the middle, and you'll thoroughly understand the IP routing table lookup process, and we'll start now. The following content only considers IPv4.
2. Route Lookup Route lookup is to find a next hop for a destination IP address, that is to find a route entry. The key of a route item is a bit prefix, such as 192.168.0.0/16,172.16.20.0/24,1.2.3.0/28,3.4.5.6/32, and these keys are used as IP expressions, The difference is that they only consider some contiguous bits in 32 bits and not necessarily all, we will find that for the 32-bit prefix of the route key, they correspond to some nodes of the lowest leaf node, and for the less than the 32-bit prefix of the route key, they correspond to some intermediate nodes. So we got a new tree based on the IPv4 address tree, the IP address tree with the routing node:
The process of routing lookups this is clear, finding the most precise route item node that passes through the entire tree as it enters the destination IP address. We can see that the closer the route to the leaf is, the more precise it is, because it is "coming to the destination".
We have two ways to reach the route node as an intermediate or leaf node.
2.1. Look up from the leaves if we already know that the input IP address is at the lowest level, then traveling to root, the first route node encountered is definitely the most accurate. However, the hypothesis does not represent the actual, this is a kind of fruit spermatic method, we do not know the actual location of the input IP, we do not want to travel from Mohe to Shigatse, so how to shorten the path is to solve the problem. Our problem is:
1). Is there a 32-bit prefix for precise routing? If so, does it have the same as the input IP address? If there is, that's it, if not ...
2). Is there a 31-bit prefix for routing? If so, which one or what (load balancer?) Measure? ...) The first 31 bits of the value are equal to the first 31 bits of the input IP? If there is, that's it, if not ...
3). Is there a 30-bit prefix for routing? If so, which one or what (load balancer?) Measure? ...) The first 30 bits of the value are equal to the first 30 bits of the input IP? If there is, that's it, if not ...
4). Is there a 29-bit prefix for routing? If so, which one or what (load balancer?) Measure? ...) The first 29 bits of the value are equal to the first 30 bits of the input IP? If there is, that's it, if not ...
...
The logic is simple, the question is how do we know if there is no, how to implement the algorithm. The answer, of course, is also obvious. That is to sort all the route items according to the prefix arrogance to the small, each prefix has a linked list, each linked list node is a route entry, as follows:
prefix list:node32-1,node32-2,node32-3,node32-4
Prefix list:node31-1,node31-2,node31-3,node31-4 ...
Prefix list:null
Prefix list:node29-1,node29-2
...
The simplest way is to take the input IP address sequentially from top to bottom, each linked list from left to right to traverse the above structure. However, it is so clear to the point that the individual should be able to think of using hash to speed up computational efficiency. Thus the above structure becomes:
Prefix hashlist:hash1 (node32-1,node32-4), Hash2 (node32-3), Hash3 (node32-2)
Prefix hashlist:hash1 (node31-8,node31-5,node31-3), Hash2 (node31-1), Hash3 (...)
Prefix hashlist:hash1 (null), HAHS2 (null), HASH3 (NULL)
Prefix hashlist:hash1 (node29-2), HASH1 (null), HASH2 (node29-2)
...
This eliminates the dilemma of every prefix-linked list traversal, and only calculates
Bucket=hashcalc (ip_input), and then walk through a small portion of the Hash[bucket] conflict list in each prefix hash table.
Bottom-up lookups always start with the most precise prefixes, and are designed to find the first matching route entry, while the top-down lookup, which is described in detail below, starts with the least precise prefix, which is designed to find the last matching route, and at first glance, the bottom-up is a good one.
The bottom-up algorithm is so obvious that it is superfluous to add a bit more text, so for the rest of the space, I want to leave it with another, diametrically opposed algorithm.
Annotations: The bottom-up algorithm is not the Linux routing table hash lookup algorithm?
2.2. From the root of the search from the IPv4 address space tree (called the address tree), because it is a binary tree, exactly match the 32-bit IPV4 address of each bit, so along the root down, will eventually be able to reach the destination IP address, and this is the last routed node (black) is to find the route , it is obvious that we can easily see this from the IPv4 address tree with routing.
Although the top-down search along the IPv4 address tree is not much time consuming, it can match up to 32 times, but building a full IPv4 address tree requires too much space. And this is to be avoided, so large space is not only difficult to use the core cache, and it is really unnecessary, because there is a more optimized way to solve the problem of routing lookups. To do this, we need to make some transformations on this IPv4 address tree with routing.
The huge IPv4 address tree is just to give a way to understand the principle of routing lookup, in fact, we should focus on how to make a specific IPv4 address, that is, the target address can quickly reach a routing node or quickly find no such routing node. To achieve this, we have to skim over a lot of "hollow nodes" without routing entries, so we need to compress the IP address tree so that we can eventually keep only a minimum number of "hollow nodes" in this tree, and they exist only to quickly guide us to a certain solid routing node.
Rearranging the location of the compressed routing node on the IP address graph with the routing node, we can find that a destination IP address is eventually used by the route from the root to the leaves (that is, it itself) the last routed node in the loop, that is, the closer the leaf node of the route project is more preferred to be used, Because it's more precise. After the IP address tree is eventually compressed, the routing nodes are all preserved, and at this point we may have the following subtree:
How do we transform this subtree to make it easier to travel from root to leaf? Here, I give two ways:
Mode 1: Merge the same key route node into a new node that carries the Mask list
We'll talk about it later, which is more effective. Both BSD and Linux use this approach. In this way, all the routing nodes are arranged to the leaves, which is very convenient for level compression.
Mode 2: The routing node adds meta information indicating the path condition
Visible, each routing node carries several different path meta-information:
left more: If there are new route entries for the subtree on the left, and if so, the current results match exactly and the next step is to go to the left, then continue to match the more accurate left sub-tree.
Right More: Ditto, left to right
Pre node idx: The index of the previous route node in the path (for which all route items need to be indexed), if the current node does not match, the previous node is taken directly to use, and if the current node matches and left/right More is 0, the current node is used.
Because of the compression of the IP address tree (as soon as possible), so this approach is not as wide as 1, because there is compression, even if the current routing node does not match, the previous does not necessarily match, moreover, due to the existence of dynamic path compression and level compression, the relationship between the nodes will change dynamically, Leaves and non-leaf nodes also change, and mode 2 needs to keep track of the relationships between nodes, with high overhead.
The compression of the IPV4 address tree is mentioned repeatedly in the previous address tree, which can now be described separately. The compression of the address tree not only saves space, but also optimizes the time overhead (but not often, and not absolutely, if you consider vicious backtracking, which can worsen the time overhead). But compression also has a price, that is, when looking for may need backtracking, and we know that in the standard full IP address tree lookup is not required backtracking. But sometimes backtracking even in the case of compression can be avoided, this does not depend on the algorithm, but depends on your input IP address, it is algorithm-independent. So the tradeoff is that risk is worth it.
When I describe the compression of the address tree, there is a premise that everything is transformed based on the previous section. About the compression of the address tree, in fact it is the standard binary tree compression, the scheme has two types, path compression and level compression.
1. Path compression path compression is simple, look at a diagram will naturally understand. But I'm not going to use that and its huge IP address graph to operate, it's too big, I just use the above part to illustrate the principle:
As you can see, the path compression does relatively simple, the operation is relatively simple, its purpose is to ignore and route the item-independent node, only the route item and the route item boot item node is reserved. All of the input IP addresses, when looking for routes, do not have to compare bits by bit, but only compared to the check bits of the routing items, and finally to a leaf node along a path, which is the route item that is finally found.
Because the route item is merged using Mode 1, it is necessary to check the mask list of the leaf route item one by one with the input IP, and finally to select the longest route item with the mask. However, due to the existence of path compression, this step may not succeed. Let me explain why this is. Look at the X-bit on the path of the routing node, for the key of the route key (such as 192.168.1.0/24, whose key is 192.168.1.0), it will all the compressed bit is assumed to be 0 or the same pattern of a number of bits, such as all 1, are 11010, that is, do not check, That is, they are ignored, the result is that the input IP address of these bits may not be 0, which will cause the final leaf node exact match will not pass. And at this point, we need to backtrack. This is the subject of the next section.
For how to travel, it is very simple, each node is similar to the following structure:
struct Node { U8 pos; struct node *child[2];};
Remove the pos of node and navigate to the POS bit of the input IP address, if it is 0, then to the left, take child[0] and, if 1, to the right, take child[1].
2.Level compression level compression is not difficult, the basic is to put a tree from high to fat, change the binary tree for the N-fork tree. As follows:
This level of compression needless to say, it is in fact the standard operation, for the array form of sparse nodes stored, or do not do levels of compression, specifically do or do not do, there is a specific theory, that is, only when starting from a POS, then bits bit has nearly 2 bits of the child node, Level compression is necessary.
Level compression also faces bit neglect when path compression, as shown in:
about how to travel after level compression, similar to the two fork tree, but depending on the compression depth and somewhat different, each node resembles the following structure:
struct Node { U8 pos; U8 bits; struct node *child[0];};
Take out node Pos and bits, and then locate the POS bit for the input IP address, value idx=ip_input[pos ... Pos+bits], take child[idx] as node, down in turn. To better support dynamic level compression, the pure binary tree path compression is also covered, except that BITS is 1.
On the level of compression, but also to say that, we always hope that the final tree can rule some, look good, then we need to do a certain transformation of the tree before compression, such as the deformed tree into a complete tree. Fortunately, the no-class routing lookup is the longest prefix match, that is, in the IP address tree, each IP address used by the route item is the path from it to the root of the nearest route item, on the other hand, each route item covers the 32nd layer of the Address tree leaf node of a range, according to the longest prefix matching rules, If coverage overlap, the depth of the routing item range automatically covers the shallow level of the route item range, if we want to better do the levels of compression, the method is very simple, the steps are as follows:
1). Move the Intermediate routing node down to the leaf node.
2). Dynamic compression or direct compression.
As for this concept, let's look at a diagram first:
2.1). Dynamic level compression
As for the Linux trie algorithm, it uses dynamic compression, that is, each intermediate node is the root of the subtree is not fixed, depending on the distribution of routing items, the principle is to compress space as much as possible, for example, sparse in the same layer of the distribution of routes, the smaller the number of forks, such as maintaining a binary tree, If it is a dense continuous distribution of the same layer, then the more the better, about this can refer to the design idea of the MMU, whether it is a Level two page table or a three-page table, or why is not a page table, depending on the virtual address space layout, considering that most programs do not fully occupy all the address space, this is a sparse distribution However, for the local part of the address space, its distribution is continuous, so the level two/three page table will be better.
2.2). Direct level Compression
This is nothing to say, the direct compression method is more intuitive, easy to understand and hardware takeover, and you can easily associate it with the interval lookup. In fact, they are really connected.
From here, when referring to the compression of the routing tree, refers to the compression transform (path compression +level compression + route item merge prefix list transformation) with the route of the IPv4 address tree.
Note: Linux and BSD
It's a little early to say here, but it can be said. Do not rigidly adhere to the name, Radix tree, trie tree and so on (so-called: DAO can road, very way! [The General people break this sentence, will read to do "DAO can road, very way ..."]), in fact, the Linux trie tree is the path compression +level compression + merge prefix linked list of mixed Lc-trie algorithm, and BSD was once the standard pure binary tree path compression + merge prefix chain list algorithm, Although it is also called the Radix tree algorithm.
3). Problems caused by compression
As I said earlier, compression can cause problems. Compression is an adventure, and the risk is worth the tradeoff. When using compression, there is a risk of losing, so the process of finding a route entry on a compressed routing address tree is a two-step procedure:
A. Exact matching process for route items
The destination IP address, ip1, is the input to the Routing Address table, which eventually arrives at a child node according to the standard discovery process, namely:
1). Remove root pos and bits, calculate IP1 's [pos ... Pos+bits] Value as child index, enter root->child[index];
2). If it is not a leaf, assign the root->child[index] to root and continue to 1);
3). If it is a leaf, then according to the Leaf mask chain list from the long to the short sort of each prefix prefix, in turn do ip1 & prefix, and the key of the leaf node to compare, if equal, then the results returned, if unequal, enter B.
The question now is, if the 3rd step above matches successfully, can the prefix of this route result be the longest of all matching results? The answer is certainly, based on the assumption that there is no compression, we arrive at the last one of the routing node is definitely the most accurate, now although the use of compression, but note that we are at risk, we assume that all the compressed bit in the route item and our input ip1 the corresponding bit is equal, only true equality, Will eventually be matched to the leaf node, and when the final match succeeds, we will know that we have taken the risk and that they are really equal. The final leaf node may be the result of a combination of many upper-level routing nodes, but we have sorted the prefixes to ensure that the longest prefix is matched first. What if the adventure fails? That's going back, that's the next detail.
B. Backward matching process with declining prefix
This is called the longest prefix match, but to avoid confusion with the longest prefix match for routing lookups, I use the prefix decrement procedure instead. This process will only be used if step a eventually fails. The penalty for risk failure is backtracking, which uses extra time overhead to compensate for the spatial benefits of compression, but think about it, too, and if not, does it cost a lot of time? Don't you need a bit comparison? After all, malignant backtracking can be avoided by balancing the operation, such as according to the above theory, the tree pressure is relatively fat, you can avoid malignant backtracking.
the backtracking process of the compressed route treeOn this topic, I still describe the local subtree, because the journey from Mohe to Shigatse is too far away.
I've always liked to illustrate the problem with an example, and for this backtracking, the first example is to look at the exact match and prefix decrement match two processes:
Well, this is a complete example. In particular, I like to give a general explanation after finishing an example, but I find it too difficult to summarize a generic algorithm, so I have to take the Linux code out again, discard some details, and present a comprehensive algorithm:
Routing tree lookup algorithm {pn = root; Chopped_off = 0; Ip_prefix = 32; while (PN) {pos = Pn.pos; bits = pn.bits; if (!chopped_off) CIndex = (Ip_input & ip_prefix) [pos ... Pos+bits]; n = cindex child to obtain PN; if (n does not exist) {backtracking; } if (n is a leaf routing node) {//lambda expression traversal n prefix list prefix (prefix->{ip_input & prefix = = N.key? Match}); if (no match) backtracking; Find and exit; } CN = N; It is obvious that you should go all the way to the left if you encounter non-leaf nodes in backtracking (Ip_prefix < pos+bits) {if (Cn.key & Ip_prefix[ip_prefix...pos] || ! (Cn.child[0])) Back } PN = n; Chopped_off = 0; Continue; backtracking: chopped_off++; Remove the effect of 0, bit0 will not affect the results, and then turn 1 to 0//For example, the current cindex is 0101100//cindex update steps are://0101000-Chopped_off = 3;ip_prefix = pos+7-3//0100000-Chopped_off = 4;ip_prefix = pos+7-4//0000000-Chopped_off = 5;ip_prefix = Pos+7-5 while ((Chopped_off <= pn.bits) &&! ( CIndex & (1<< (chopped_off-1))) chopped_off++; Update Ip_prefix if (Ip_prefix > Pn.pos + pn.bits-chopped_off) Ip_prefix = Pn.pos + pn.bits-chopped _off; if (Chopped_off <= pn.bits) {//Eliminate 1 rightmost 1 CIndex &= ~ (1 << (chopped_off-1)); } else {parent = Gets the parent of the PN; CIndex = Pn.key[parent.pos...parent.pos + parent.bits]; PN = parent; Chopped_off = 0; Back } }}
As for backtracking, finally I want to give a plan without backtracking.
It seems that backtracking is to correct the error caused by compression, the error is that the compressed bit is a fuzzy match rather than an exact match, in fact it is the wrong way, as shown in:
In order to make up for this error, as mentioned earlier, we need to backtrack, see the following two ways to make up, one of which does not backtrack, in order to isolate the problem, in this case I did not shift the routing item down to the leaf node:
The right side of the way is easier to understand, it does not backtrack, but all the routes that once covered the entire position in the order from the lower layer to the upper level, that is, the accuracy of the descending sequence in the final node, so even if the wrong way, we only need to take the entire list of "next element" is OK. In fact, this is a bit like a hash conflict list. But why is it so obvious that the way is not widely used? The reason is that the maintenance cost is too high, all the routing items are interrelated, insert/delete/update a route item, the worst case can touch the changes of all the routing items, after all, the algorithm is based on the coverage relationship, so it is still better to keep the route items independent.
Now we're talking about interval matching, which is also based on coverage, but it's quite different, and interval matching is a split-remapping of route items rather than a simple overall association.
2.3.IP address range Lookup If you look at the IP address tree with the routing node again, it will be found that from the root down, the closest to the leaf routing node, it will be hidden from its path to the root of all other routing nodes, that is, it is solely responsible for its own down until the leaf layer of all the routes to reach the leaf IP address. Obviously, this is an interval. It is easy to prove that each IP address uniquely corresponds to one such interval, and only one routing item is responsible for the interval.
Then, the problem of finding a route for an IP address is converted to the interval for which an IP address is to be found. As shown below, I still use the subtree to illustrate the problem, and once again give the figure for the level compression section:
It can be seen that, as long as the following routing node exists, it overwrites the above-found route entries, and it is clear that the default route entry is root.
Didn't we abandon the entire address tree long ago? Yes! But for interval lookups, you really need a complete tree of addresses, but we have a better way to build efficient lookup structures. At first, we can certainly construct a static array, where the elements are:
struct Element { u32 start; U32 end; struct Rt_node *node;};
Indicates that a range corresponds to a route item. This table is statically constructed during a route insertion. If so, we can take the destination IP address as input, check which address range the IP address belongs to, and then remove node is the result. If this is the case, if the route entry is the baseline, then the total array size is the number of route items, generally no more than 256 (a hop to reach the IP address), if the interval is the basis, then the size of the array is the route items to the entire IP address space divided into the number of intervals, This number is affected by the number of routing entries and the aggregation, routing distributions, which, if properly configured, will not be a large number. The rest is the interval lookup algorithm.
In fact there are better, more optimized scenarios, that is, the DXR algorithm, on this topic, I have written several articles, "from the analog MMU design a routing table failure to DXR regression", "simulation MMU design a IPV4 address index of the routing table, different from the DXR" and so on, so no longer repeat, In these articles, I have described the idea of DXR in detail from scratch. In a word, the scheme of dividing sub-zones makes efficient time and space utilization.
At the end of this section, about interval lookups, I don't want to describe how it is used in routing lookups, just know that an input IP address corresponds to an interval, and that one interval corresponds to a next hop is enough. I want to describe a given IP address, how to locate the interval for it, or more broadly, to divide a contiguous field into several intervals, given a number of that field, and how to locate which interval it belongs to. Let's look at the interval first:
Let's say that these intervals are in front of the closed-open interval, that is, each boundary point belongs to the subsequent interval. The problem now becomes how to construct a single portal to open the out-of-the-way search process, obviously, I am biased, because if the hash algorithm is not necessary to construct a single entry, the hash function has defined the boundaries, only need to resolve the conflict, In the case of a much less conflicting element than key, traversing the conflict list is not a bad idea. But this article is a two-fork tree, so my prejudice is that I want to build a binary tree, map this interval to this tree, and then open the automatic search process from the root, or, the binary search process. First I construct the boundary points of these intervals into leaves, and then from the leaves, to the roots, reverse construct the tree:
In fact, in order to efficiently find the interval, there are super many optimization programs, such as the abstract Huffman interval tree and so on, I give here is an easy to understand the simple principle. About interval matching, that's it. Originally wanted to this section to extend to Hipac search, but I think it is to write another article.
2.4. Compressed routing tree lookup and interval lookup in this section, let's talk about the things behind compressed routing tree lookups (such as Lc-trie, etc.) and interval lookups (such as the DXR). The two are diametrically opposed, and we can analyze them from both parameters:
1). fuzziness exists where the compression of the route tree is due to compression, because the compression of the nodes outside the route is as compressed as possible, which brings the input IP address from the root and downstream of the ambiguity, this travel process if the non-compressed IP address tree, will eventually be in the 32nd level exactly one position, However, due to the existence of compression, the path may have errors before reaching a leaf route, i.e. there is a possibility of "going wrong", as shown in:
This error must be corrected by backtracking retry, but since the node that failed the match is already "as accurate as possible" node, it is necessary to continuously expand the matching range from right to left when backtracking 1 to 0.
Interval matching algorithm because of the separation of the key prefix of the route entry and the next hop, can be more to a shared next hop, it does not need to compress, because the ambiguity to the interval, that is, do not need to enter the IP address to the 32nd layer of the leaf node of the exact position, corresponding to a range can be, And this interval corresponds to a next hop that has been separated from the routing item, which is the routing-actually finding the next hop, and in fact, there is no complete route entry, the route item has long been incarnated for an interval and a next hop.
In any case, fuzziness is sure to exist, because you cannot save a complete IP address tree, so you need to sacrifice its accuracy, bring a bit of ambiguity, where this ambiguity exists, bringing the essence of the algorithm is different.
For the bottom-up lookup, the ambiguity is that you don't know in advance where the input IP address will fall, or even which interval is unknown, so you have to take all the route items according to the prefix long and short one by one, and the hash algorithm, just to help the process of this attempt to end faster, It is not necessary to find algorithms, you can completely replace the hash algorithm with any kind of search algorithm that you think is faster than hash. The essence of bottom-up lookup is that the route items are tried in descending order of prefix length, rather than the hash algorithm.
2). In order to find a complete and most accurate route item (including prefix and next hop) and match it with the input IP address, the lookup of the route entry as the primary or the primary compressed routing tree with the input IP addresses, and it matches, because the number of route items is determined, so in backtracking, the POS of the route item is constantly referenced. The bits bits of the input IP address are continuously changed from right to left by 1 to 0, expecting to match to a route item, and the distribution of the route item is guaranteed to be the most accurate once it is matched.
The interval lookup algorithm splits the route items, each of which corresponds to a unique next hop, so as long as the input IP address corresponds to a certain interval, the lookup process ends, because the complete uncompressed and non-transformed IP address tree construction guarantees that an interval uniquely corresponds to one of the most accurate next hops, The next hop of the route item that is closest to the leaf layer.
2.5. Follow-up of routing lookups we have heard of many operating system routing lookup algorithms, perhaps you would think BSD Radix lookup and Linux Hash/trie lookup just unfinished full work, only implemented the logic, without regard to coding, You may think that only the DXR is an efficient and perfect routing table coding scheme! Is that right? I don't think so!
Both BSD and Linux implementations are common scenarios. It does not assume that you have any cache available, they are pure algorithm level, they can let you analyze its time complexity and space complexity, but the DXR is not, DXR is more a way of implementation than the algorithm itself, because it takes advantage of too much to let you take advantage of things. Maybe it would be better if I used the DXR algorithm as a forwarding solution.
In addition, I refuse to discuss the name. If an interviewer asks you to come, you say you only know the algorithm does not know what name. Long ago, I heard that BSD algorithm called Radix tree algorithm, and then someone called it Pc-trie, and the Linux algorithm is called Lc-trie, and then I was dizzy by these names, simply don't care what they call it, explode!
3. Numerous find instances 3.1. Classification Hash Lookup
Path Compression Trie tree
Level compression Trie Tree
oh,no!!! Explosion
3.2. Find time Complexity Summary I think the time complexity of the search is most meaningful, the space complexity significance is relatively unimportant, because the better, the better, the worse the less, occupy less space, the more easy to use the cache, resulting in time gains. I also think that the construction time complexity of the data structure is not important, because unless the router is poisoned, it will always be a rare event compared to finding operations, insertions, deletions, and updates. Therefore, it is necessary to update costs in order to be efficient at the time of finding, sacrificing some construction.
Specific time complexity please search by yourself, in short, for the binary tree, the basic is the N*LOGN,LOGN level, of course, do not consider the worst case, the situation has long been discovered by the OS, found after the balance operation, for the level of compressed tree, it is necessary to consider each index bit length. For the hash algorithm, it depends on how well you choose the hash function and how to resolve the conflict. This article considers almost all of the situation is about the search, did not say how to insert, build, this is not the topic of this article, because the mainstream software routers are concerned about the search performance, they will hardly run complex dynamic routing protocol, so that the dynamic routing protocol is almost the core router, at least an access layer, They are concerned with the design of the hardware rather than the algorithm of the software. On the software router, the lookup performance must be optimized to at least every second approaching million, which is sufficient for the above algorithm. Again, IP routing is regular, and they are not random sets of numbers that let you sort something out. In addition, if you really fall in love with the DXR, then you put the cache addressing as an internal sort, and the memory of the sort operation right when the external sort of it, in fact, the transfer of memory as a "Greek citizen", in terms of forwarding, memory is disk ...
Of course, you can also mix a variety of algorithms, such as the DXR algorithm, it is in fact the first to build a two-tier 65535-fork tree, and then on the node of the tree to hook up an interval tree, this combination of programs can often achieve amazing results.
1). Step decomposition
In this article, I simply give the query structure of the routing table, but does not explain how these structures are constructed, in fact, really, this is really troublesome, but irrelevant, why? Because these things can be "done slowly," it means that you have enough time to think about things.
2). blur/Efficiency Tradeoff
Time and space is not two yuan opposed, sometimes, for example, in the data statistics, if you do not need to be particularly accurate statistics, you can get some time and space gains at the same price, the cost is to take the risk of error statistics. This can first use fuzzy matching to filter out most of the mismatch, and then with a small number of exact matching to correct the error, in fact, even if not talk about Bloom, the road through the compressed tree itself adopted this idea.
4. Routing tables and forwarding statements, routing cache I had to pull the openvpn! again.
OpenVPN by me into a multi-threaded, occasionally will segment fault. However, the entire Multi_instance table is still global. We know that the OpenVPN all rely on the Multi_instance table to route the packet, this table is a routing table! Each MI holds a client's virtual IP address, real IP address, virtual MAC address (tap mode) ... Come over. A tun character device's packet, need to use the target MAC address or target virtual IP address (or Iroute address) as key to find the Multi_instance table, in order to obtain the corresponding client's real IP address and port, the reverse is the same. As you can see, the position of the Multi_instance table in OpenVPN is a routing table.
Multiple threads in a multi-threaded version of OpenVPN The table is common, and since it is not often written, I use a read-write lock to protect it, which is a good idea, at least I thought. However, wouldn't it be better to keep a global multi_instance table in the system and then refine the table to a more lookup-seeking structure, and copy it to every thread? So there is no lock operation. This unlocked mind is also the ZEROMQ idea: don't expect a bunch of drunken people to share a bottle of wine safely. This replicated local table is called the forwarding table, and the global table is called the routing tables. The forwarding table is made from the routing table. In my expected launch of the OpenVPN refactoring plan, I will design a global multi_insance table, and then construct a forwarding post based on it in each thread, using ZEROMQ completely ... That's a little off the topic ... Explosion
This will bring the code to a world without locks, since there are a bunch of drunks, why not give them a bottle of wine? In fact, in the core of the router, it is the adoption of such a design.
Does the route cache mean much? This is related to the application type, in the past, the time of the TCP long connection of the fixed node network, such as TELNET,FTP, and so on, the temporal locality of the data flow is very obvious, even in the multi-message short connection such as HTTP (long connection better) protocol, time locality is also available, But now and the future is different, peer network, Opportunity Network, random networks, self-organizing networks, mobile networks, the communication path of nodes and nodes in such networks is unpredictable, even if the long connection protocol, Routing items that have been cached on the last location node router will no longer have a chance to hit because of the fast movement of the nodes, even if we don't consider mobility, consider some signaling protocols, or a myriad of complex applications ...
Linux has now canceled the support of the route cache, and I myself before it cancels the support of the routing cache, but also for another reason to cancel the support of the route cache, the cancellation of the routing cache has two main reasons, one is that the routing items have no need to be cached-they will not be used in eight lifetimes, Second-need to maintain cache consistency? Oh, this second reason is unreasonable, because the IP protocol itself does not need to maintain state, so there is no need for any cache consistency. And I, because of this second reason to cancel the route cache, but I did not knowingly, but rather, but I used the Linux netfilter conntrack mechanism, and the conntrack mechanism and IP coordination is always not good. So, I confess, I paid the price.
...
PostScript this composition I should have continued to write longer, as I had yesterday noon and family to eat a small hot pot drink a bit of wine show that: now pen is 8000 words ... Think at the beginning, whenever the Wednesday Chinese class, a thought to write 600 words of the composition, I am extremely depressed, feel like this is a way to pass ... I believe a lot of people are the same as me. But, no, no, really, I really have 8000 words to pen. Now no one forced themselves to write things, but Miss High school time, in fact, now think about, 600 words count what, that more or less just a experience ... No idea is going to work. Now we do not have the opportunity to write, really, estimates are not only the year-end summary, project summary, meeting minutes, debriefing report, the application for the party, the job resume, confession record ... And sometimes I write blogs like this. But are you really going to write it? Although, the bricks think that writing is unrealistic, but they did not see the "Days will Lion" in the Roman Legion can cover the castle without moving bricks; although businessmen think that the number is important, it is their prejudice; although, programmers think that I write these are useless, although, the military think as long as can fight on the line; Also some people think, as long as can shout can ... But aren't all these words spread? The idea of words can express everything, the rest is just coding problems, which is related to the routing table and forwarding issues. DAO can road, very way ... This could also be the book of the Greek who I finally ended up with in the last two months.
To mention myself, I will not program, but also not absolutely not, I slightly will be some, I will program, but not ruthless, but also well-made, I am proficient in world history, but not all know, also do not understand, I understand the life of thick black, but usually say anything, I do not, but the heart of what all understand, I love drinking, once every day But drink more also vomit, I am puzzled, and people do not talk, wine not gathering, gather not to stick wine, thick tak learned, striving, always repair yuan original, benefit people, dare not say Paragon, at least understand Hello World, really understand, and very proficient, even know not to start it with main and _start, or, Start it directly in the BIOS, even, take a piece of chalk directly on the ground of my home community square to start it, the first two, I reckon you can, but the last one, I reckon you dare not.
Proposition composition: A thorough understanding of the various search procedures for IP routing tables in a IPV4 address tree