Analysis of the principle of hardware Routing and Forwarding "turn"

Source: Internet
Author: User

Someone asked me, for route forwarding, the performance of hardware forwarding is really more than the software look-up table such as Linux system a lot of it?
How can I answer that? Give the data? I have no data, because my job is not to do this piece. Give a theory? I'm afraid I don't have that eloquence either. Draw a picture? I found that my system did not have the tools to install the logic circuit. So what? The answer to this question I really in the heart, just difficult to say, so, I use hand-drawn mobile phone photos, show the power of hardware to publish, because there is no actual data, I use a relatively comparative way, let you see the software forwarding why is a garbage. Also, by the way, how the professional router switch forwards the packet.
However, hand drawing is really not flattering, so I drew a hand drawing with viso.

I'm a delivery road.

How many ten years is life?
I want to be a delivery, a road. I sent an IP datagram and fixed the all-optical network.
Now is a impetuous era, we are scrambling to the top of the food chain to crawl, what Android App,ios what, and I always in my own area of interest, a mix is 10 years! And this area is the bottom of the food chain.

Declaration before the start

I assume that you already know the basic principles of the gate circuit, such as with doors, non-doors, or doors, latches, capacitor storage units, storage arrays and other basic concepts. Next I use these doors to show what the actual routing query process is like for a packet. Before you continue reading, be aware that
For experts: This article is not for you, because this is my own thinking, I for the sake of beginners, not follow the norms of Cam,tcam, the use of space is not good, in short, compared with your imagination, my practice is very rubbish.
For those who do not know: This article is very suitable for you, you can let you experience the essence of the moment, this is my wish.

Step by step

First give a picture:

Given a target IPv4 address, after a few doors can find the next hop index, is not very wonderful AH. In fact, that's really the case.
If you look closely at this picture, you may find, what is the great Ah, your decoder decoding is the address of 32bit, that is, 32-bit complete, 4G address, each prefix must have 4G entry space, this can bear ah, but I want to say is, Here just give an example, I certainly know that the N-way group connected to the way to organize, but that will add some comparison gate circuit, not intuitive, so I will directly use the 32-bit address key. Perhaps you will also argue, even with the software, I build a 4G size hash table, with the IPV4 address itself as a hash value, not more simple? Is it more optimized than hardware implementations? What I'm trying to say is, no!. Why? Take a look at the normal hit process for a CPU cache, as shown below:

There are several gates compared to a IPV4 address routing lookup process. Note that this is just before the memory of the cache matching gate circuit, which is more efficient, if not hit, the circuit involved in the circuit is not to say, the CPU is simply the execution unit of these circuits is enough, if you understand the microstructure, then you will know that the execution of an instruction is an extremely complex process , the CPU has a universal execution unit, pipeline, and so on, each step is to spend the clock cycle, then ask, the following code how many gate circuit?
Entry = Bucket[destip];
Nexthop = entry->nexthop;
First translate it into a compilation and then look at it. Moreover, you can not create such a large hash table, the general view, Trie tree Lookup method is efficient, more efficient than it is my own DXR pro++ structure, but even the DXR pro++, compared with the above hardware forwarding is also a weak explosion.
Continue to look at the above hardware route forwarding schematic diagram, the decoder behind the cross-network in fact should be painted black box more refreshing, but I can not help drawing into a nondescript appearance, each intersection has a 1bit of data, it is not 0 or 1. Decoder decoding Results Select a word line, and then the line of all the intersection with the line because of the line level pull high, the intersection of 1BIT data on the splinters fell down, which is actually the principle of memory access. After the first decoder, there is a more important operation, that is the "longest prefix" of the logic, I do not know the standard Tcam is how to do, but I think my above way can also explain the problem. Two concepts are introduced, the anti-mask and the de-bit, where the anti-mask is the result of a bitwise inversion of the mask, and the elimination bit is the only one that gets the match to the "longest prefix". Finally, we get the longest matching prefix of the corresponding next-hop index address, and then a decoding process, through the address to get the next index, to win.
Note that all operations are carried out simultaneously, with the 28-bit prefix matched, 24-bit, 16-bit, 10-bit, and 8-bit prefixes also synchronized, and they pass through the same set of gates simultaneously. That's the advantage!

Hardware forwarding and CPU forwarding

We found that the circuit I described above almost can not do anything else, it can only for a IPV4 address query next hop (and of course, write, delete and other circuit logic, I did not draw out), however, is such a circuit, than the CPU that complex things more efficient, so-called it is professional, and the CPU is only generic.
CPU as a general execution unit, it is completely stupid, but what can do, specifically what to do, is to be obtained from the memory of the "instructions" to indicate, that is, the instruction itself is a form of data, programming is the nature of what? The essence of programming is programming the internal circuitry of the CPU with instructions obtained from memory.
However, for hardware forwarding, its input is a IPV4 address, purely a data! It has no instruction, and the instruction is the circuit itself. So, it can do it all at the same time, as if this "instruction" was entered from the outside. The CPU is unable to execute such "instructions" because the directive is special, modern processors are growing to RISC, and programmers and compilers themselves decide exactly what to do, rather than the processor's designers guessing what functionality the programmer will use. I can think of the extreme, if in a CISC processor, it in order to be worthy of its classification, its designers designed a "routing query" instructions, then through the CISC processor inside the circuit, it is really likely to be similar to the above.
The eyes are a little open ....

Analysis of the principle of hardware Routing and Forwarding "turn"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.