A method for handling shared interruptions in Linux

Source: Internet
Author: User

A method for handling shared interruptions in Linux

Some time ago, when debugging a chip, I encountered a strange problem: As long as a PS2 keyboard is inserted on the card, the system may enter the serial port to interrupt the function execution when the kernel is started, after a while, the panic will not proceed. Later, after analyzing the panic stack when a problem occurs, I read the interruption status bit of the serial port with EJTAG tool, and found that the serial port did not actually generate an interruption. Then, if the serial port is not interrupted, how does the kernel run to the interrupt service function of the serial port?

We know that Linux interrupt can be divided into I/O interrupt, clock interrupt, and inter-processor interrupt. I/O interruption is an important way for Linux to respond to external IO events. Although different platforms and architectures implement different methods, different devices can share the same interrupt vector. For example, in the PCI bus structure, several devices can share the same IRQ line, that is, they share an interrupt number, and their respective interrupt service routines are mounted on this interrupt number. When the CPU responds to this IRQ, it will check the routines attached to this interrupt vector one by one and execute those truly interrupted routines. But the serial port obviously does not belong to PCI, and it is impossible to use the PCI disconnection and interrupt pin registers. What causes the kernel panic?

With the above problems, I discussed various possibilities with hardware engineers and chip design colleagues. Finally, combined with the chip board schematic, I found that the low-speed devices on the board were integrated into the chip, these low-speed devices include serial ports, LPC controller, SPI control, I2C controller, and NAND controller. The interrupt signals of cpu uart and LPC are routed to the interrupt pins of the same CPU. The specific connection diagram is shown as follows:

The leftmost part shows that the Interrupt request signals of the cpu uart and the cpu lpc controller are logically or output to Interrupt Pending Bit2 In the CPU chip. Once the CPU detects that any Interrupt Pending Bit is set, it will query each Interrupt Pending Bit one by one according to the agreed sequence, and execute the interrupt service routine of the device driver that routes the interruption to this bit.

In the old board design version, no device was connected to the cpu lpc, and no problems were encountered. Later, due to the customer's needs, the cpu lpc port on the board can be connected to the PS2 keyboard and mouse, and the problem is also exposed. Therefore, I guess the root cause of the problem is that there is no handling of interruptions to devices connected to the LPC of the shared ip2. According to this idea, I checked the previous interrupt distribution processing function and found that only do_irq (58) was called when ip2 ). Refer to the serial port driver code in the kernel. The default interrupt number of the first cpu uart is 58. Therefore, in this case, the existing code executes do_irq (58) indiscriminately once the accessed PS2 keyboard or mouse emits an interrupt signal ). Obviously, this is not the expected behavior. You need to query the respective interrupt status bits to check whether the device is interrupted. In the latest Linux 4.2 kernel trunk branch, we can still find this part of code to be improved: arch/mips/loongson64/loongson-3/irq. c:

Void mach_irq_dispatch (unsigned int pending)

{

If (pending & CAUSEF_IP7)

Do_IRQ (LOONGSON_TIMER_IRQ );

# If defined (CONFIG_SMP)

Else if (pending & CAUSEF_IP6)

Loongson3_ipi_interrupt (NULL );

# Endif

Else if (pending & CAUSEF_IP3)

Ht_irqdispatch ();

Else if (pending & CAUSEF_IP2)

Do_IRQ (LOONGSON_UART_IRQ );

Else {

Pr_err ("% s: spurious interrupt \ n", _ func __);

Spurious_interrupt ();

}

}

In addition, the interrupt signal on the LPC in the chip is level-triggered. According to the definition of the LPC configuration register, after the interrupt is completed, the lpc sirq needs to be cleared, in this case, the interrupt ack and eoi functions of the response must be added. In contrast, interruptions on the serial port compatible with NS16550A are edge-triggered and do not need to be cleared by programs: when the transfer storage register is empty, bit 1 of the serial port will be set. Once data is written to the transfer storage register, this bit will be cleared. When the number of characters in the first-in-first-out queue reaches the trigger level, bit 2 of ISR will be set up until the program reads and receives FIFO.

According to the above two ideas, the Response Code was modified, the kernel was re-compiled, and a lot of tests were re-performed without repeating the problems mentioned above. In this example, we can see that in addition to the mechanism of shared interrupt vectors such as PCI interrupt, interrupt signals from different physical devices are routed to the same interrupt PIN but different interrupt numbers are used. In this case. Engineers need to comprehensively consider the chip structure, hardware connection, interrupt routing, and kernel interrupt handling processes, and provide a complete and robust solution. However, after comparing the similarities and differences between the two, it is not difficult to find that the essence is the same: no matter whether the interrupt number is the same, any device that routes to the CPU internal interrupt or external interrupt controller, all should enjoy the opportunity of being served, should not be omitted.

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.