Simulate MMU to design a route table that indexes IPv4 addresses, which is different from DxR and mmudxr.

Source: Internet
Author: User

Simulate MMU to design a route table that indexes IPv4 addresses, which is different from DxR and mmudxr.
I don't know if anyone has played this game. Maybe, maybe not.
Time and space are always on the opposite sides. because of incomplete facilities, you can only choose when resources are scarce. However, if the resources are abundant, you can have both the fish and the bear's paw! For route search, the compact data structure occupies a small amount of space. Is it time-consuming? If we consider the MMU facility, we will find that the compact data structure not only saves space, but also increases the speed.
Our long-term education means that we must give up such education. If we don't give up, what we get will not be righteousness, or be blackmailed. We don't blame ourselves for being cheated because we didn't die. In fact, if you think about it, will a compact and small data structure waste time even when resources are not so abundant or even scarce? Or do algorithms with high speed and efficiency need to waste memory? If so, how can MMU be designed? If this is the case, MMU will be ruthlessly pass out because it consumes more time and space to maintain itself than the benefits it brings. However, we all know that it was finally designed. In addition, thanks to the efficient use of the CPU Cache, MMU degrades to a slow path that will be enabled when the Cache does not hit. We know through locality that most of the time, the traffic goes through the expressway... it seems that the idea should be changed.
What I know is that the DxR algorithm is playing like this, so it's not my personal nonsense. But my gameplay is a little different from that of DxR...
Forward the longest mask logic to the insertion time. For Linux and other general operating systems, the overhead of the Table query in the route table is mainly concentrated on "Longest mask matching, because the input is only an IPv4 address, that is, the destination IP address extracted from the IP header, at this time, the IP routing module does not know which entries in the routing table are related to the IPv4 address. You need to perform a search to determine which entries are related to the IPv4 address. During the search, the "Longest mask matching" logic is executed, for the HASH organization algorithm, hash tables are organized according to the mask length, and the matching order is descending from the 32-bit mask. For the TRIE organization algorithm, the situation is similar, for more information, see overview of route table search algorithms for Internet routing-Hash/LC-Trie tree/256-way-mtrie tree.
For search, especially the HASH algorithm, comparison is inevitable, and the comparison result is either 0 or not 0. If the cost of comparison is removed, it will theoretically save half of the time, for the TRIE tree, backtracing is used to reduce more costs. Of course, no matter the HASH algorithm or the TRIE tree algorithm, there will be a lot of optimizations around the data structure itself and the characteristics of the address, but unfortunately, these optimizations do not cure the problem, you cannot simply eradicate the "Search" operation! All right, eliminating search/comparison is the fundamental goal!
Indexing IPv4 addresses is the answer! IPv4 address space has a total of 4G addresses. If each address is used as an index, it will consume a huge amount of memory, but this is not an essential problem, you can organize a page table as a hierarchical index. The essential question is how to map the route entry and index to one-to-one! That is, an IPv4 address is used as an index, and the index directly points to the result with the longest mask in all route results! This is not difficult. I will not introduce a multi-level index, but still use a Flat 32-bit IPv4 address space for the first-level index. As shown in:




It can be seen that the most important thing is to use the routing prefix to divide the IPv4 address space into multiple intervals. As shown in, take the target IP address as the index and go to the right. The first route entry encountered is the result. The longest mask logic is completely reflected in the insert/delete process, that is, the left-to-right prefix is shortened in turn, and the long prefix routing item is prefixed in front of the short prefix routing item, this is the core idea. In fact, this idea is used in HiPac firewall, that is, the interval priority. It is just a rational and clever arrangement of the data structure. The longest mask logic is prefixed to the insertion/deletion time, and the IP address is indexed. This will put the matching process in place in one step.
We cannot use a route with a long prefix to completely overwrite the latter because when it is deleted, the latter will be exposed.
Now, let's sum up. When the insert/delete operation is executed, ensure that the route entry with the longest mask overwrites the top of its IP address range.
Routing or IP exchange the Internet is designed to be routing-based rather than exchange-based. There is a philosophical reason behind this. However, today, people have gradually added the exchange feature to IP routing, so as to design a lot of hardware-based fast forwarding devices or rely on the route table to generate an exchange forwarding table for fast forwarding, such as a layer-3 switch. But what is exchange? What is a route? Simply put, a route is a relatively soft name. Its execution method is "retrieve the field of the protocol header, and then 'longest prefix Match' with the content of the route table ', during this period, there were a lot of memory access and comparison operations, while the exchange was a hard statement. The execution method was "retrieve an 'index field' from the protocol header ', directly index the exchange table, and then forward the table directly according to the index pointing to the result ". Let's see if my gameplay and DxR have changed routing to switching. Maybe you think this is nothing more than a small skill, but shouldn't you be happy with this little thing in your life...
As we all know, the modern operating system is based on virtual memory, which provides better isolation and access control between processes, this article is about "one kind of exploitation" based on this principle ".
As a matter of fact, when a computer running a modern operating system is running, every access to an address has to go through a "lookup ", this search process is so fast that most users and even programmers (except system programmers) will turn a blind eye, and even many people do not know that such a search process exists, this search process is the virtual/physical address conversion process of MMU.
If I use an IPv4 route prefix as the virtual memory address, use its next hop and other route result information as the content of the physical page, and create a page table ing according to the corresponding relationship, so I only need to access the target IPv4 address extracted from the IP header to obtain the content of the corresponding physical page. What is the content? Suit? NO! The content is the result of the route. I simplified the first section and then changed it to the following:




Have you seen anything? Isn't that a page table? Yes, the IPv4 address is used as the index, and the route entry result is used as the physical page. The longest mask matching process is reflected in the ing construction process. However, this is a problem! The occupied space is too large! Yes, the solution of MMU is to build multi-level ing. This principle can also be used in route tables. After bending the figure above, it becomes a route matching table for MMU-like facilities:




Now, the route matching table is completely nested in the MMU facility, and the IPv4 address is completely indexed! For example, if the IPv4 address is 0x01020304, you only need the following access to obtain its route entry:
char *addr = 0x01020304;struct fib_res *res = (struct fib_res *)*addr;
If a page is missing, there is no matched route, that is, the Network is unreachable. If there is a default route, all the virtual addresses without specified ing will be on the "Default routing page.

Although the figure shown above looks really like a MMU Facility, have you noticed the differences with MMU?
The physical page size of MMU ing is fixed. However, the address range covered by each route in the routing table is not fixed, but what is the relationship between them? After a long time, I was prepared to write a simulation implementation. I felt very excited. Then I took a shower. I couldn't help it. I like cold, but it was so cold at home. Maybe, taking a hot bath can bring about some ideas, but not only does it bring about some ideas, but it finds a serious problem, that is, the routing item and the physical page cannot be completely compared, because its size is not fixed, if the IPv4 address space is split according to a page similar to 4096 size, a range that covers 4096 IPv4 addresses will be obtained in the second-level "routing page table, do they have to use the same route entry? I felt so stupid at that time! I pushed down my ideas and solved the problem. This is not a problem at all. I have clearly drawn the last timeline above! I used all 32-bit IPv4 addresses for the index, instead of leaving 12 characters as low as 4096 page tables! I actually built an address table instead of an address block table. The complexity lies in the insertion and encoding of the next hop. I think it is absolutely impossible to store pointers in the final route "page", because for 32-bit systems, pointers require 4 bytes, and 64-bit systems require more, to cope with the extreme situation of an IPv4 address and a route, each target IPv4 address can be used as the so-called "item" finally located by the index. Only one byte can be used !!
How to use one byte? What if I have 10 thousand table items? Haha! In turn, what do we finally want? Get a next hop! How many next hops will there be in total? Is 256 sufficient? I think it is enough! You may have 10 thousand route table items, but they will reuse a much smaller "Next Hop ". Have you ever seen a router pick up more than 200 cables at the same time? Switch! Therefore, I may encode the following code: Place all the next hops in a continuous memory with a fixed size, then, use the final route page table and add the byte index pointed to by the offset to index these next hops (if the number of next hops exceeds 256, there is still a way to do this, it is to borrow bits from byte for alignment and free use. Alignment not only facilitates fast memory addressing, but also utilizes cacheline ing ).




I drew the above picture later. I did not follow this idea when taking a bath. Instead, I was thinking about the rationality of D16R (DxR instance directly indexed with 16bit, will I also be introduced into my DxR mind? I was excited and frustrated when I thought about it. I was excited because I designed DxR with originality. The frustration was that I really didn't want to learn about it, I want to design a fully indexed multi-level index table without adding any so-called "algorithms", so I want to avoid various trees, such as binary searches, it even avoids hashing and traversal. Therefore, before using this algorithm, I want to record the reasons why I want to avoid this algorithm. The following section should be encrypted. In case of being seen, don't dislike it, this is a hobby.
Why does O (1) avoid various trees, hash, and exquisite algorithms quickly? What about O (n) and O (lgn? Big O should be self-reliant!
First, when designing and implementing a system, do not be bound by the theory in the algorithm book. Large O aims to provide a scalability consideration. Simply put, if the algorithm does not increase the computing latency as the number of elements increases, it is O (1 ), if the increase in the number of elements and time is a log relationship, it is O (lgn ). What is n, and how much is the curve "? You may say that this MMU-based route table is not suitable for IPv6. It occupies a large amount of space and thus does not have scalability, but I did not say that it should be used for IPv6, for an IPv4 route, isn't it the same as a 32-bit virtual address? Why didn't MMU design consider scalability? The answer is that when MMU is applied on a 64-bit system, it can have more options, such as reverse hash tables, but for 32-bit systems, the fully indexed MMU is definitely better than the hash of various trees. In addition, it is more suitable for hardware implementation because it is "non-logical" and simple! Here is an inappropriate example. For an O (1) algorithm, the execution time is 100, even if n reaches 10000000000... every trip is in December 100. It is definitely an O (1) algorithm, and there is an O (n2) algorithm. When n is equal to 100, the execution time is 1ns, and Hercules knows, in a specific environment, n is no greater than 500. Which algorithm do you choose?
In an IPv4 environment, or in an IPv6 environment with no bad money to buy memory, or in any controllable and limited environment (do not limit it! There is no limit in the computer! You can see how hard it is to calculate a big number in OpenSSL.) multi-level index tables are undoubtedly the fastest data structure. hash is the best option, but it is definitely not fast. Indexing ensures the speed, and the multi-level ensures that the space usage is not too large, where the number of levels is the number of operations performed by the algorithm, and the others are on the cloud.
The large O Method of an algorithm is suitable for algorithm analysis. However, if it is used in a real system, many other constraints must be considered. In terms of data access, Big O ignores the overhead of Memory Access addressing, and smooths the efficiency differences between cache levels (these differences are by magnitude ), the Command Execution smooths the time difference between commands for various operations, ignoring the cache and MMU. However, these cannot be ignored in actual implementation. Algorithm analysis cannot even be regarded as a software performance analysis. This is not a defect because it is not done by people. Both software and hardware transformation can improve the same algorithm. Different hardware wiring may lead to different actual overhead, for example, changing a bus and moving a location... therefore, the final performance should be the functions of algorithms, software implementation, and hardware implementation, with different weights. People tend to care very much about algorithms themselves, and secondly about software implementation. For hardware, they basically look up and don't have the money to do anything. Unlike the first two, you can simply change the algorithm and implement it.
Implementation of reality-it's time to use such a wonderful analogy to create a perfect search structure!
Simply put, you only need to create an "address space" and then fill the MMU with the route table content. But it is not that simple. For example, in Linux, the following problems may occur:
1. you cannot use the C library or any other library because there are data and commands in the address space. Every command, that is, the command of the process itself occupies a virtual address, this address cannot be an IPv4 address... the library encapsulates a large number of commands and therefore cannot be used.
2. you can't even use the kernel. The kernel itself is shared with all address spaces. As a management organization, the code of the kernel itself is also mapped to any address space, for example, many IP addresses above 0xC0000000 are mapped to physical memory.
Due to code instruction ing, the entire virtual address space cannot all be used by IPv4 addresses. What is the solution?
Now that I have learned my thoughts, why do I have to completely copy them? Direct use of MMU facilities? This idea is so crazy that it also proves that the thinker is too lazy! It is true that you can use a set of facilities in a virtual MMU With virtualization support, but this only shows that you are proficient in the hardware itself. Why not build a soft MMU?
The building of DxR route tables is undoubtedly compact and subtle. It does not expect the use of ready-made MMU. Instead, it adds a bipartite method. This is a good compromise simulation. I can also do this. I don't expect the matching algorithm to simulate MMU to be fast. Instead, I want to learn about the DxR idea, that is, using a compact data structure to improve the CPU Cache utilization, try to Cache the results to the CPU instead of transmitting the requests to the bus! What can I do even if I fully use the system's hardware MMU? Can I use its TLB? If not, what is the significance? Do you know what TLB hit means? Do you know that most MMU addressing operations do not directly query page tables, but are basically hit in TLB? TLB is a Cache! Therefore, simulating MMU is not a fundamental goal. Using Cache is the king!
We know that the reason why the CPU Cache (including TLB) can be hit at a considerable frequency is the memory addressing locality! Does this locality exist for IP addresses? Imagine that multiple data packets belonging to a data stream will continue to pass through. If the data packets of different data streams pass through the wrong peak, we will know that the Locality Principle is a universal principle. Traffic Engineering on the core path is based on the path, while QoS is based on the application. This classification principle will promote locality rather than offset it! After all, what is classification? This is a philosophical question. For more than two thousand years since Plato, people are still debating whether classification is for aggregation or hash...
This is a harvest during the Spring Festival of the goat. It is similar to MMU. It simulates MMU. In addition, it reads many historical books and watches several movies, one of the still-acceptable horror films, "Grievance", tells history in Shaoxing Lanting...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.