Implementation principle of TCP Segment Offload (TSO), offloadtso
It was so hot in the morning that I suddenly remembered that someone had talked to me about TSO three weeks ago. I also described the principle of TSO. This principle is also very simple. It is nothing more than segmentation by network card hardware, calculate the checksum to free up the CPU cycle. In fact, one piece is enough. Since hardware is used for segmentation, the hardware can only calculate the checksum, because you do not know the segmentation details of the hardware, so you cannot calculate the checksum of each segment before segmentation ....
Almost everyone knows the principle of TSO. In fact, how it is implemented is not difficult. What is difficult is the details. After finishing my work, I want to demonstrate this principle. Of course, it may be very different from the actual implementation. In any case, it is a schematic diagram and I will observe it carefully, you can also implement a TSO better than me.
This design is a digital logic, the range of time series circuits, and this field is very high, not ordinary software programmers can hold, like me half a bottle. As a result, I still try to give a result directly, instead of asking people who listen to books to prepare for it first. People often get tired of giving up when doing these preparations.
Basic knowledge is not difficult, that is, some door circuits, and the door, non-door, comparator, decoder, trigger, and so on. These things constitute a very complete principle of computer composition. The key is how to combine them. This is another field of programming. At this time, I think of the phrase Liu danqing, a senior physics veteran, said 15 years ago when talking about the circuit: let the current flow. This sentence seems totally non-compliant with the basic principle of circuit design by the team. They may prefer modeling first, analyzing it, then writing code using the Description Language VHDL, and finally providing the circuit, I think this is suitable for design, but it is not suitable for telling a layman about its highlights. For a layman, the only thing he knows is to let the current flow, and then rush through the door and the pipe. Well, the High Level turns to a low level .... in my opinion, this is the case.
On a piece of white paper, I drew a pile of door circuits and then randomly combined them. Slowly, I suddenly found that this circuit is the framework of TSO. I remember that the route forwarding table was solidified by some people last week. However, the solidified behavior may be dropped because of high costs. After all, today's soft implementation is enough. Therefore, only the core transmission network needs this fixed forwarding table. However, TSO is the first push in the server field. There are too many servers, far more than the core forwarding devices, and their CPU needs to be reduced. Indeed, the CPU is a waste of computing some fixed-mode things. It should spend more energy on some uncontrollable things. Therefore, TCP segmentation is naturally done by the network adapter. You, me, him, we all met TSO, but we will only turn it on and off. If you want to know how it Offload, please see to make the current flow first:
TCP segmentation is very different from IP segmentation. You must understand this. Then you can understand the figure above.
The preceding Parsing is only a special case. In fact, all hardware acceleration mechanisms are the same. When I was reading the Intel Gigabit/10g Nic manual, I thought that in the chip, the components of this circuit were almost massive, implementing RSS and hardware hash classification. This is what I call the flood of rivers. It's not the purpose of this Article. This article only describes this possibility. This is also the essential difference between this dedicated circuit and general-purpose CPU. The CPU has an instruction set, which means that it focuses on external calls. The focus of a dedicated circuit is the internal execution logic, which provides almost no external interfaces, the only difference is to set the values of several registers, such as MTU, data packet length, and data packet header length. Other execution logic does not allow external access. This is the essential difference between serial programming and parallel execution.
There are also some things to say about the command system. In terms of internal control logic, there is a unified command distribution system, which is actually a series of combinations of 0 and 1. The 0 and 1 in this combination act on various door circuits, these gate circuits, after receiving these different inputs, generate different outputs and then use them as inputs to other gate circuits, resulting in different outputs, so repeatedly... isn't that true? It's hard for you to decide otherwise.
Let the current flow be first-class. If you think it is abstract, observe the flood process. The location of the river dike is different and the disaster caused is also different, the key lies in the terrain at the dike and the terrain connected to it. All these occur at the same time. Like the current, when the water flows through a curve or an arch bridge, there will also be some latency and shunt, which can be analogous to various circuits.
It's annoying to have a meal!
.....
After dinner, I think a slight transformation of this circuit is a scattered/clustered IO circuit. It is just an unfinished thing, or a reserve. Explosion
Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.