Network card Interrupt Load Balancing

Last Update:2018-07-23 Source: Internet

Author: User

Tags cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

problems in the real world

With the reduction of hardware cost, our server configuration is more and more "high", but a single server's packet processing capacity is still very limited, often see the CPU load is highly, The Linux kernel has introduced the NetFilter framework since the 2.4 release, and the processing power of the network packet is largely measured by the number of concurrent connections, each of which is actually doing a lot of processing in the kernel. In particular, there are a large number of UDP packet processing, these UDP packets are a short connection, the amount of concurrency is large, that is, the CPU cost is very large. Mpstat-p all 1:

We can see from the figure that cpu0 is in a relatively busy state, occupancy rate of 60%-70%, is already high load, in fact, the Linux kernel from 2.4 has been supporting the interrupt number on each CPU load balance, it is clear that we do not make full use of the existing multiple CPU resources.

the way to solve it

We know that any peripheral (such as disk, NIC, etc.) requires CPU services, will throw an interruption, interrupt tell the CPU what happened, the CPU will stop the current work to deal with this interruption. For example, when the network card received the package, if the CPU is executing an application process handler, at the moment will be interrupted by the network card interrupt execution interrupt handler. The interrupt handlers for each peripheral are naturally different, so in order to differentiate and prevent multiple devices from issuing the same interrupt request, each device in the system is assigned a unique IRQ (Interupt request).

IRQ information for each device can be read from/proc/interrupts:

The first column is the IRQ for each device, which may be different on different machines, and we can see that the eth1 corresponding IRQ for the NIC is 16.

The second column to column fifth corresponds to the number of interrupts that CPU0-CPU3 has processed about an IRQ (corresponding device). This is the screenshot I have done after optimization, you can see that each CPU has handled about the same as many eth1 network card interrupts.

The sixth column corresponds to the interrupt controller.

The seventh column is the specific equipment.

In fact, our optimization method is also to the eth1 interrupt request evenly distributed on each CPU, from the beginning of Linux 2.4 to implement the IRQ binding on a certain CPU or some CPU, called multiple CPU interrupt affinity. To achieve this, there are several conditions:

1. eth1 corresponding interrupt controller must be IO-APIC chip, sometimes hardware support, but Io-apic is not enabled, need to adjust system startup parameters to change, this must be noted.

2. Only a specific CPU to support, if it is the kind of CPU hyper-threading, there is no way, I have experimented with some of the Intel Xeon processor is supported, such as Intel (R) Xeon (r) CPU 5130 @ 2.00ghz,amd has tried several models do not support.

Set interrupt allocation method is very simple, is to modify the/proc/irq/irq/smp_affinity, where the IRQ has already said that can be checked from the/proc/interrupts, in the example above the eth1 corresponding IRQ is 16, If we need to interrupt the eth1 NIC to be processed on average by each CPU, just execute the following command:

echo FF >/proc/irq/16/smp_affinity

If you want to handle the interrupt binding for eth1 on CPU2, execute:

echo >/proc/irq/16/smp_affinity

The number of echo here is 16, the 1 bit corresponds to the CPU, 1 corresponds to cpu0, 2 corresponds to CPU1, 4 corresponds to cpu2,8 cpu3, so on.

Let's take a look at our optimized results:

Can see CPU busy degree average, single CPU occupancy rate only 20-30%, processing ability greatly improved.

The screenshot above is in the chat record roaming several machines to do the optimization, optimized after the machine CPU load greatly reduced, does not need to carry on the expansion, reduces the operating cost effectively.

Report:

1, how to see a process on which CPU to run:

#top

Then press f to enter the top Current fields Settings page:

Checked: J:p = Last used CPU (SMP)

There is one more item: P shows which CPU this process uses.

2, Linux, how to see the utilization rate of each CPU:

#top

Then press 1. Displays multiple CPUs

Cpu0:1.0%us, 3.0%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1:0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

In addition, shift+p, sorted by CPU usage, shift+m, sorted by memory usage

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More