Analog MMU design a routing table that indexes IPv4 addresses, unlike the DXR

Last Update:2015-03-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I don't know if anyone has played this way, maybe there is, maybe not.
Time and space are always in the favoritism, only because the facilities are not complete, in the age of resource scarcity, can only choose. But if the resources are plentiful, the fish and bear paws can be fully combined! For a routing lookup, the compact data structure takes up a small amount of space, is it going to pay the price for it? If we consider the MMU facility, we will find that the compact data structure saves space and improves speed.
Our long-term education is to take righteousness must sacrifice such an education, if not sacrifice, not to be righteous, may also be blackmailed, not blame themselves for being false, because they did not die. In fact, even if the resources are not so abundant, even the resources are scarce, the compact data structure will surely lead to a waste of time? Or is it that fast and efficient algorithms must waste memory? If so, how could the MMU be designed? If this is the case, the MMU will be ruthlessly passed away because of the amount of time and space it consumes to protect itself from the benefits of its goals. However, as we all know, it was eventually designed. And, thanks to the efficient use of the CPU cache, the MMU degenerated into a slow path that will only be enabled when the cache is not hit, and by locality we know that most of the time, the flow of execution is fast ... It seems that the idea should be changed a bit.
I know that the DXR algorithm is the way it is, so it's not my personal nonsense. But I play with the DXR a little bit different ...
The longest mask logic is advanced to the moment of insertion. For general operating systems such as Linux, the routing table cost is mainly concentrated on "longest mask match", because in the process of IP routing, the input is only a IPv4 address, that is, the destination IP address extracted from the IP header, At this time, the IP routing module does not know which entries in the routing table and this IPV4 address related, need to do a search to determine, in the process of finding the "longest mask matching" logic, for the hash organization algorithm, according to the length of the mask to organize hash table, matching order from 32-bit mask down, For the trie organization algorithm, the situation is similar, see the Internet Routing table lookup algorithm overview-hash/lc-trie Tree/256-way-mtrie tree.
For the search, especially the hash algorithm, it is inevitable to compare, the results are either 0 or not 0, if the cost of the comparison, in theory will save half the time, for the trie tree, calculate the backtracking, saving more cost. Of course, whether it is a hash algorithm or trie tree algorithm, will be around the data structure itself and the characteristics of the address to do a lot of optimization, but unfortunately, these optimization symptoms do not root, can fundamentally "find" this operation to eradicate! Well, the elimination of Find/compare is the fundamental goal!
Index The IPV4 address, that's the answer! IPV4 address space has a total of 4G addresses, if each address as an index, then will consume a huge amount of memory, but this is not the essence of the problem, you can learn the page table as the organization of component-level index. The essence of the problem is how to make a multiple-to-one correspondence between the routing item and the index! That is, it is an index based on an IPV4 address, and then the index points directly to the result of the longest mask in all route results! This is not difficult, I would not introduce a multilevel index, still use a flat 32-bit IPv4 address space to do a first-level index. As shown in the following:

As can be seen, the most critical is to use the route prefix to divide the IPv4 address space into multiple intervals, as shown, holding the target IP address when the index, go to the right, the first route encountered is the result. The logic of the longest mask is fully reflected in the Insert/delete process, that is, the left-to-right prefix is shortened sequentially, the long-prefixed route entry is covered by the short prefix of the route entry, which is the core idea. In fact, in the HIPAC firewall, it is also the use of this idea, that is, the interval priority. Just cleverly orchestrated data structures, the longest mask logic ahead of the insertion/deletion moment, the IP address is indexed, which will make the matching process one step.
We can't completely overwrite the back with the prefix-long route, because when it is removed, the back is still exposed.
Okay, here's a summary. When the Insert/delete operation is performed, the route item that guarantees the longest mask is overwritten at the front of the address range.
There is a philosophical rationale behind whether routing or switching IP Internet is designed to be routing-based rather than exchange-based. Nowadays, however, people have gradually added the feature of switching to IP routing, and designed a lot of fast forwarding devices based on hardware, or relying on the routing table to generate Exchange and forwarding statements, such as layer three switch. But what exactly is Exchange? What is routing again? In short, routing is a relatively soft term, its execution is "take out the field of the protocol header, and then the contents of the routing table to do the ' longest prefix match ', during which a large number of visits, comparison operations", and the exchange is more hard to say, the way it executes "from the protocol head out of a" Indexed field ', directly index the interchange table, and then forward directly based on the results that the index points to. See, my play and the DXR is not changed routing for the exchange, perhaps you think this is nothing but tricks, but life should not for this little thing and happy ah ...
Conceived implementations-turning "find" into "access" it is well known that modern operating systems are based on virtual memory and better implement the isolation and access control between processes, but this article does not talk about these, this article is based on the principle of "a use."
In fact, in the run of a computer running a modern operating system, every access to an address has to undergo a "find", the process is so fast that most users and even programmers (except system programmers) will be blind, and even many people do not know that there is such a search process, This lookup process is the virtual/physical address conversion process of the MMU.
If I use the IPv4 route prefix as the virtual memory address, the corresponding next hop and other routing results information as the content of the physical page and the corresponding relationship to establish the page table mapping, then I only need to access the IP header extracted from the target IPv4 address, you can get the corresponding physical page content, what is the content? A suit? No! The content is the result of the route. I'll simplify the first section and change it a little bit, and it will look like this:

Did you see anything? Isn't that the page table? Yes, the IPV4 address as an index, and the result of the route item as a physical page, the longest mask matching process is reflected in the process of building the map. But that's a problem! The space is too high! Yes, the solution to the MMU is to construct a multilevel mapping, which can also be used by the routing table. Bend the figure above and turn it into a routing match table for a class MMU facility:

Okay, now we're all set up the routing match table in the MMU facility, the IPV4 address is fully indexed! "Access IPv4" address directly like the memory address, such as the IPV4 address is 0x01020304, then in order to obtain its route entry results, only the following access:

Char *addr = 0x01020304;struct fib_res *res = (struct fib_res *) *addr;

If a page fault occurs, there is no matching route, that is, the network is unreachable, and if there is a default route, all virtual addresses that do not specify the mappings will fall on the default routing page.

And the difference between the MMU and the episode although the picture above looks really like the MMU facility, did you notice the difference?
The physical page size of the MMU map is fixed, but the range of address ranges covered by each route in the routing table is not fixed, but what does it matter? Toss the most days to write a simulation to achieve, feel very excited, and then go to a bath, no way, I like cold, but the home is too cold, perhaps, a hot bath can bring some ideas, but not only did not bring any ideas, but found a serious problem is that the Routing and physical page can not be fully analogous, Because its size is not fixed, if the IPV4 address space is segmented according to a page like 4096 size, and eventually indexed to a range of 4,096 IPv4 addresses in the Level two Routing page table, do they have to use the same route entry? I feel so stupid at that time! It is not a problem to push your own thoughts down one step, and this is not an issue at all, I draw clearly in the last picture above! I used all the 32-bit IPV4 addresses to do the indexing, rather than the 4096-size page table as low as 12! I'm actually building an address table instead of an address block table. Complexity is all about inserting and encoding the next hop. I think it is absolutely impossible to store pointers in the final Routing "page," because for 32-bit systems, pointers are 4 bytes, 64 bits more, in order to cope with an extreme situation where a IPV4 address is a route, each target IPv4 as the index finally locates the so-called "item", Only one byte can be used!!
How do I use a byte? What if I have 10,000 table entries? Ha ha! In turn, what are we going to get in the end? Get a next jump! How many next jumps will there be in total? 256 enough? I think it's enough! You may have 10,000 routing table entries, but they will reuse a much smaller "next hop." Have you ever seen a router that blossoms like 200 or more cables? Switch it! So I might have the following encoding: Put all the next hop consecutively in a contiguous piece of memory, each item size fixed, and then with the final Routing page table plus offset that one byte index the next hop (if the number of the next hop exceeds 256, there is a way, The borrow in byte, which is not available for alignment purposes, is not only useful for fast memory addressing, but also for cacheline mapping.

The above diagram is I painted after the shower, I did not follow this line of thought go down, but in thinking d16r (with 16bit as a direct index of the DXR instance) rationality, I also want to be led to the idea of the DXR? Thinking of this, I am excited and depressed, excited because I originally designed the DXR, the frustration is that I really do not want to learn it, I want to design a fully indexed multilevel index table, do not add any so-called "algorithm", so I want to avoid all kinds of trees, such as binary search, and even avoid hashing and traversal. So before I use it, I want to record the reason I want to avoid this algorithm, the following an abridged should be encrypted, in case of being seen, do not like to spray, this is not spit noisy, this is a hobby.
Reasons to avoid various trees, hashes and Subtle Algorithms O (1) must be fast? O (n) and O (LGN)? Big O when self-improvement!
First of all, when designing and implementing a system, don't be bound by the theory of algorithmic books. Big O provides a consideration for extensibility, simply by saying that if the algorithm does not increase the calculation delay with the number of elements, then it is O (1), and if the number of elements and the increase in time is the log relationship, then it is O (LGN). How much is the specific n, and how much does it take to "make a big turn"? Perhaps you would say that this MMU-based routing table is not suitable for IPv6, so it will occupy most of the space, so it does not have scalability, but I do not say for IPv6 Ah, for IPV4 route, it is not the same as the 32-bit virtual address? How did the MMU design not consider extensibility? The answer is that when the MMU application on the 64-bit system, it can have more choices, such as reverse hash table, but for 32-bit systems, fully indexed MMU is definitely better than all kinds of tree hash, in addition, it is more suitable for hardware implementation, because it is "no logic", simple! To cite an inappropriate example. If an O (1) algorithm, its execution time is 100 years, even if N to 10000000000 ... Every trip down is 100 years, absolutely an O (1) algorithm, there is an O (N2) algorithm, it is at N equals 100 when the execution time is 1ns, and Hercules know that in a particular environment, N will not be greater than 500, you will choose which algorithm?
In the IPV4 environment, or in the IPV6 environment that is not bad for the money to buy the memory, or under any controllable limited environment (do not pull unlimited!) There is no infinity in the computer! You look at OpenSSL to calculate a big number more laborious AH), Multilevel Index table is undoubtedly the fastest data structure, the best use of course is hash, but it absolutely no index fast. Indexing to ensure that the speed, multi-level guarantee that the space is not too large, which is the number of algorithms to perform operations, and others are clouds.
The algorithm's large O method is suitable for algorithmic analysis, but if used in real systems, many other constraints must be considered. Large o ignores the overhead of the access addressing on the data accesses, smoothing the efficiency difference (they are the order of magnitude difference) of the cache at all levels, smoothing the instruction time differences of various operations on instruction execution, ignoring the cache, ignoring the MMU, but these can not be ignored in the actual implementation. Algorithm analysis is not even a software performance analysis, this is not its shortcomings, because they do not do this. Software and hardware transformation can be improved by the same algorithm, the different hardware cabling can cause the actual cost difference, such as changing a bus, move a position ... Therefore, the final performance should be the algorithm itself, software implementation, hardware implementation of the functions of the three, weighted value is not the same. People often care about the algorithm itself, followed by software implementation, for hardware, the basic is to look up, no money how can not, unlike the former two, change an algorithm, change the implementation can be done.
Realistic implementation-with such a wonderful analogy, it is time to use the perfect search structure.
Simply put, you just need to create an "address space" and then populate the MMU with the contents of the routing table. But it's not as simple as having the following problems in Linux:
1. You cannot use C library or any other library because the address space has the data also has the instruction, each instruction, namely the process itself instruction will occupy a virtual address, this address cannot be the IPV4 address ... The library encapsulates a large number of instructions and is therefore not available.
2. You can't even use the kernel. This is not fun, the kernel itself for all the address space sharing, the kernel as a management agency, its code itself is mapped in any address space, such as 0xc0000000 more than a lot of addresses are and physical memory one by one mapping, not much to say.
Because of the mapping of code directives, the entire virtual address space cannot be used for all IPv4 addresses, so what is the solution?
Now that you have learned your mind, why do you have to copy it completely? Directly using the MMU facility? The idea is too crazy to prove that the thinker is too lazy! Admittedly, you can use a set of facilities in a virtual MMU with virtualization support, but that only means you're more proficient with the hardware itself. Why not build a soft MMU yourself?
The DXR routing table is very compact and sophisticated, and it doesn't expect to use a ready-made MMU, but adds a two-point approach to it, which is a good compromise simulation, and I can do that. I'm not counting on how fast this simulation of the MMU matching algorithm itself can be, but to learn the DXR idea of using a compact data structure to increase the CPU cache utilization, as much as possible to cache the result to the CPU instead of sending the request to the bus! What if the system's hardware MMU is fully used? Can you make use of its TLB? If not, what's the point? Do you know what a TLB hit means? Do you know that most of the MMU addressing operations are not directly to the page table but are basically hit in the TLB? TLB is a kind of cache! Therefore, the simulation of the MMU is not the fundamental purpose, the use of the cache is kingly!
We know that the CPU Cache (including TLB) can be hit at a considerable frequency because of the locality of memory addressing! Does this locality exist for IP addresses? Imagine that multiple packets belonging to a single data stream will continue to pass through, and the packet error peaks of different data streams will know that the principle of locality is a universal principle. Traffic engineering on the core path is path-based, and QoS is application-based, and this classification principle promotes locality rather than offsetting it! After all, what's the category? This is a philosophical question, since Plato, 2000 years, people still continue to debate, whether the classification is to aggregate, or to hash ...
This is a harvest during the year of the goat, the analogy of the MMU, simulation of the MMU, the other harvest is to read a lot of history of books, watched several films, one of which is still can be a horror film "Hatred Spirit", in Shaoxing Orchid Pavilion to speak history ...

Analog MMU design a routing table that indexes IPv4 addresses, unlike the DXR

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Analog MMU design a routing table that indexes IPv4 addresses, unlike the DXR

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Analog MMU design a routing table that indexes IPv4 addresses, unlike the DXR

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support