Layer-3 Switch processor Packet Handling

Source: Internet
Author: User

Layer-3 switches are still quite common, So I studied the issues related to the layer-3 Switch processor packet sending and receiving. Here I will share with you, hoping to help you. In the current layer-3 Ethernet Switching Device, the layer-2 switching and layer-3 routing of packets are mainly completed by the switching chip and network processor. the CPU is basically not involved in the switching and routing process, it manages and controls the switching chip.

In this case, the CPU load mainly comes from the following aspects: Protocol timing driver, user configuration driver, and external event driver. Among them, the driver of external events is the most random and unpredictable. Typical external events include port connection/disconnection (Up/Down), and Media Access Control (MAC) Address message reporting (including learning, aging, and migration ), the CPU receives packets through direct memory access (DMA), and the CPU sends packets through DMA.

Among the external events listed above, the processing after the CPU receives the packet through DMA is the most complex. When packets are sent from the lower layer to the upper layer software, the processing actions of each protocol vary widely, and may involve packet sending, port operations, batch table operations, and so on. Therefore, only when problems related to the CPU packet sending and receiving are well handled can the relevant upper-layer protocols interact normally, so that the layer-3 switch can run stably and efficiently.

Possible problems

The following sections describe various aspects of CPU packet sending and receiving. The analysis below is based on a typical CPU packet sending and receiving mechanism: CPU ports are divided into queues, receive packets through DMA, and adopt a ring queue.

CPU load and package Cycle Control

Based on the data packet processing capability of the layer-3 switch, the number of packets sent to the CPU per unit time is determined; after the number of packets sent to the CPU per unit time is determined, next, let's take into account the speed of sending data packets. Assuming that the upper limit of CPU data packets sent per unit of time is determined through evaluation, for example, x data packets per second.

(1) reporting CPU at a constant speed

When packets are reported to the CPU at a constant speed, the impact on the CPU queue is small, and the buffer capacity of the CPU queue is not high, so the CPU queue does not have to do a lot.

(2) Report to CPU in Burst Mode

The hardware receiving queue on the exchange chip (using ASIC) side and the ring queue in the DMA memory space give the layer-3 switch a certain buffer capacity (for packets sent to the CPU ). With this buffer capability, we can extend the control period appropriately and set the control granularity (the upper limit of the number of CPU reports received per unit control cycle ), the CPU package receiving function is dynamically enabled and disabled using a mechanism similar to the negative feedback in the circuit. In this way, the CPU speed of data packets is controlled at a macro level. In addition, if the switch chip (using ASIC) supports the CPU port outbound traffic monitoring or shaping function based on the token bucket algorithm [2-3], the minimum threshold value of regulatory or shaping can meet the requirement of CPU speed limit. This function can be used to control the CPU delivery speed of data packets and reduce the CPU load. This simplifies software processing.

CPU port queue length Planning

If you only consider the buffer capacity of the CPU port of the layer-3 switch, the longer the CPU port queue, the better. However, you must consider the impact on other functions and performance. Specific problem analysis is required for different ASIC chips.

Zero copy

Zero copy refers to the use of pointers as parameters during the entire data packet processing process without copying the entire data packet. This greatly improves the CPU processing efficiency. When zero copy is used, the flexibility of software processing will be reduced to a certain extent. We will encounter the problem that if the protocol stack needs to change the content of a data packet, it will directly receive the cache (buffer) but if you want to delete or add fields (such as adding or deleting a layer of tag) in the data packet, that is, what should you do when the length of the data packet needs to change.

Adding or deleting fields will inevitably lead to the movement of the position on one or the end of the packet header. If the end of the packet is moved, the problem is simple as long as the total length of the packet does not exceed the buffer boundary. Generally, such operations are close to the header. If one side of the header is moved, the efficiency will be relatively high. Therefore, the Protocol Stack may be more inclined to move on the other side of the header during processing, in this case, the driver needs to handle the buffer allocation:

(1) When receiving data packets, the header pointer cannot point to the buffer boundary and must be offset to a certain margin. At the same time, the size of a single buffer must be equal to the maximum transmission unit (MTU) and the margin.

(2) The first pointer of the buffer must be normalized when data packets are released.

At present, layer-3 switches mainly involve the generation of switching chips. The main external interruptions of switching chips include DMA operations (such as packet forwarding, packet sending ends, and new address messages) and some error messages. If the request is interrupted too frequently, frequent context switching between the interrupt service program (ISR) and other processes will consume a lot of CPU time. If a large number of interrupted requests are sustained, the CPU will always be in a busy state, and various protocols cannot get enough scheduling time, resulting in serious faults such as protocol state machine timeout. To avoid uncontrollable event triggering frequency, you can use the polling mechanism. Generally, the CPU timer is used to trigger the ISR originally triggered by an external interrupt. Because the timer trigger interval is fixed, therefore, ISR execution frequency is controlled to avoid the above problems.

Compared with external interruptions, Round Robin only has a controllable pace (the pace of external interruptions depends on the frequency of external events and the CPU is uncontrollable ). However, polling also has its inevitable disadvantage-slow response. Some features with high real-time requirements cannot be met. In addition, when the ping command is used to check the layer-3 switch interface package, the latency of the layer-3 Switch using the round robin method is significantly greater than that of the layer-3 Switch using the interrupt method. If a mechanism can be used to avoid sustained and large number of interrupted requests, it can ensure that the CPU is not too busy and retain the advantages of Real-Time Interrupt Processing.

The typical act of generating a large number of interruption events is to report the messages received by the CPU and from the MAC address. Taking package receiving as an example, the Burst method mentioned in the previous section "CPU load and package receiving rhythm control" is to control the receiving DMA switch based on real-time traffic, this achieves the purpose of controlling the interrupt source. This mechanism similar to negative feedback can effectively prevent the continuous interruption event from being reported to the CPU.

In short, the round robin control is simple, but the real-time performance is poor; the interrupt real-time performance is good, but it makes it difficult to control all the interrupt sources. In the initial system design phase, we need to consider both the requirements and the way the chip handles external events to decide whether to use either the interrupt or polling method or both.

With the development of Ethernet-related technologies, the processing capabilities of switching chip and network processor are constantly improved. In contrast, the CPU processing performance of data switching equipment is far less than that of switching chip and network processor; at the same time, the business types supported by data exchange devices are also increasing, and the CPU-carried business volume is also demanding. In this case, the conflict between the capacity of the switching device, the significant increase in service types, and the limited CPU resources will become increasingly prominent. Therefore, ensuring the secure and stable operation of data exchange equipment is a prerequisite for ensuring the buffer management, queue scheduling, and traffic monitoring of CPU and switching chips, as well as network processor interfaces, it is also an important topic for data exchange device development at present and in the future.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.