Http://www.ibm.com/developerworks/cn/linux/l-cn-linuxkernelint/index.html
In this paper, the interrupt system is analyzed and discussed, including interrupt controller, interrupt classification, interrupt affinity, interrupt transfer and interruption migration in SMP. Firstly, the principle of interrupt operation is briefly analyzed, and then the realization principle of interrupt affinity is discussed in detail, and the mechanism of the process of disconnection and non-threading interruption is analyzed.
SMP (symmetric multi-processing), the abbreviation for symmetric multiprocessor structures, refers to the aggregation of a set of processors (multiple CPUs) on a single computer, the sharing of memory subsystems between CPUs, and the bus structure. Supported by this technology, a server system can run multiple processors at the same time and share memory and other host resources.
What is an interrupt
The Linux kernel needs to manage all the hardware devices that are connected to the computer, which is certainly a matter of its own. If you want to manage these devices, you have to communicate with each other first, there are generally two scenarios to achieve this function:
- Polling (polling) allows the kernel to periodically query the status of the device and then make the appropriate processing;
- Interrupts (interrupt) allow the hardware to signal to the kernel when needed (changing the kernel to proactively hardware).
The first scenario allows the kernel to do a lot of work, because polling will always be repeated periodically, CPU time is consumed, so efficiency and low, so the second scenario is generally adopted. Note 1
From a physical point of view, an interrupt is an electrical signal that is generated by a hardware device and fed directly to the input pin of an interrupt controller (such as 8259A), which is then sent to the processor by the interrupt controller. Once the processor detects the signal, it interrupts the work it is currently working on, and instead handles the interrupt. Thereafter, the processor notifies the OS that an interrupt has been generated. In this way, the OS can handle the interrupt appropriately. different devices correspond to different interrupts, and each interrupt is identified by a unique digital ID, which is often referred to as an interrupt request line.
Back to top of page
APIC vs 8259A
The CPU of the X86 computer provides only two external pins for interrupts:NMI and INTR. Where NMI is an unshielded interrupt, which is typically used for power-down and physical memory parity; The INTR is a shielded interrupt that can be interrupted by setting the interrupt screen bit, which is primarily used to accept interrupt signals from external hardware, which are passed to the CPU by the interrupt controller.
There are two common types of interrupt controllers:
1. Programmable Interrupt Controller 8259A
The traditional PIC (Programmable Interrupt Controller) is connected in a "cascade" manner by two 8259A-style external chips. Each chip can handle up to 8 different IRQ types. Since the INT output line from pic is connected to the IRQ2 pin of the main pic, the number of IRQ lines available is 15, as shown in 1.
Figure 1:8259a CASCADE schematic diagram
2. Advanced Programmable Interrupt Controller (APIC)
8259A is only suitable for single CPUs, and in order to fully exploit the parallelism of the SMP architecture, it is critical that interrupts be delivered to each CPU in the system. For this reason, Intel has introduced a new component called I/O Advanced Programmable controller to replace the old-fashioned 8259A programmable interrupt controller. The component consists of two main components: the "local APIC", which is responsible for transmitting the interrupt signal to the specified processor; For example, a machine with three processors, it must have a relative three local APIC. Another important part is the I/O APIC, which mainly collects Interrupt signals from I/O devices and sends a signal to the local APIC when those devices need to be interrupted, with up to 8 I/O APIC in the system.
Each local APIC has a 32-bit register, an internal clock, a local timing device, and two additional IRQ lines LINT0 and LINT1 reserved for local interrupts. All local APIC are connected to the I/O APIC, forming a multistage APIC system, shown in 2.
Figure 2: Multi-level I/O APIC system
Most single-processor systems now contain an I/O APIC chip, which can be configured in the following two ways:
1) as a standard 8259A mode of operation. The local APIC is disabled, the external I/O APIC is connected to the CPU, and two LINT0 and LINT1 are respectively connected to the INTR and NMI pins.
2) as a standard external I/O APIC. The local APIC is activated and all external interrupts are received through the I/O APIC.
To tell if a system is using the I/O APIC, you can enter the following command at the command line:
# cat/proc/interrupts CPU0 0: 90504 io-apic-edge timer 1: 131 Io-apic-edge i8042 8: 4 Io-apic-edge RTC 9: 0 io-apic-level ACPI: 111 Io-apic-edge i8042: 1862 io-apic-edge ide0: Io-apic-edge ide1177 : 9 Io-apic-level eth0185: 0 io-apic-level via82cxxx ...
If Io-apic is listed in the output results, your system is using APIC. If you see xt-pic, it means that your system is using a 8259A chip.
Back to top of page
Interrupt classification
Interrupts can be divided into synchronous (synchronous) interrupts and asynchronous (asynchronous) interrupts :
1. A synchronous interrupt is generated by the CPU control unit when the instruction is executed, and is called synchronization because the CPU will not be interrupted until the execution of one instruction is completed, rather than during the execution of code instructions, such as a system call.
2. Asynchronous interrupts are randomly generated by other hardware devices in accordance with CPU clock signals, meaning that interrupts can occur between instructions, such as keyboard interrupts.
According to Intel's official data, a synchronous interrupt is called an exception (exception), and an asynchronous interrupt is called an interrupt (interrupt).
Interrupts can be divided into shielded interrupts (maskable interrupt) and unshielded interrupts (nomaskable interrupt). Exceptions can be categorized as faults (fault),traps, and termination (abort) class three.
Broadly speaking, interrupts can be divided into four categories: interrupts , faults , traps , and terminations . See table 1 for similarities and differences between these categories.
Table 1: Interrupt categories and their behavior
category |
cause |
Asynchronous/Synchronous |
return Behavior |
Interrupt |
Signals from I/O devices |
Asynchronous |
Always return to the next instruction |
Trap |
Intentional exception |
Synchronous |
Always return to the next instruction |
Fault |
Potentially recoverable errors |
Synchronous |
Return to current instruction |
Terminate |
Unrecoverable error |
Synchronous |
does not return |
Each interrupt of the X86 architecture is given a unique number or vector (8-bit unsigned integer). Unshielded interrupts and anomaly vectors are fixed, while shielded interrupt vectors can be changed by programming the Interrupt controller.
Back to top of page
Introduction to the Linux 2.6 interrupt processing principle
The Interrupt Descriptor Table (Interrupt descriptor Table,idt) is a system sheet that is associated with each interrupt or anomaly vector, and each vector in the table holds the corresponding interrupt or exception handler's entry address. The kernel must load the initial address of the IDT table into the IDTR register and initialize each item in the table before allowing the interrupt to occur, that is, when the system initializes.
When in real mode, IDT is initialized and used by the BIOS program. However, once Linux starts taking over, IDT is moved to another area of ARM and is initialized for a second time because Linux does not use any BIOS program and uses its own dedicated interrupt service program (routine) (Interrupt Service Routine,is R). Interrupts and exception handlers are much like regular C functions
There are three main data structures that contain all the information associated with the IRQ: hw_interrupt_type
, irq_desc_t
and irqaction
, Figure 3 explains how they relate to each other.
Figure 3:IRQ the relationship between structures
In the X86 system, for two different types of interrupt controllers, 8259A and I/O APIC, hw_interrupt_type
structs are given different values, as shown in table 2.
table 2:8259a and I/O APIC PIC differences
8259A |
I/O APIC |
static struct Hw_interrupt_type I8259a_irq_type = {"Xt-pic", STARTUP_8259A_IRQ, SHUTDOWN_8259A_IRQ, E NABLE_8259A_IRQ, Disable_8259a_irq, mask_and_ack_8259a, END_8259A_IRQ, NULL}; |
static struct Hw_interrupt_type Ioapic_edge_type = {. TypeName = "Io-apic-edge",. Startup = Startup_edge_ioapic,. Shut Down = shutdown_edge_ioapic,. Enable = Enable_edge_ioapic,. Disable = Disable_edge_ioapic,. ack = Ack_edge_ioapic,. End = End_edge_ioapic,. set_affinity = set_ioapic_affinity,}; static struct Hw_interrupt_type Ioapic_level_type = {. TypeName = "Io-apic-level",. Startup = Startup_level_ioapic,. Shutd own = Shutdown_level_ioapic,. Enable = Enable_level_ioapic,. Disable = Disable_level_ioapic,. ack = Mask_and_ack_level_io APIC,. End = End_level_ioapic,. set_affinity = set_ioapic_affinity,}; |
In the interrupt initialization phase, the hw_interrupt_type
variable that invokes the type initializes the members of the irq_desc_t
struct handle
. The Cascade 8259A is used in an earlier system, so it is used for i8259A_irq_type
initialization, and for SMP systems either ioapic_edge_type
, or to ioapic_level_type
initialize the handle
variable.
For each peripheral, the interrupt handler is registered with the Linux kernel either statically (a global variable declared as a static
type) or dynamic (called a request_irq
function). Regardless of how it is registered, a structure is declared or assigned irqaction
(which points to the handler
Interrupt Service program), and then the function is called setup_irq()
irq_desc_t
and irqaction
associated.
When an interrupt occurs, the Interrupt Service Program entry address is obtained through the interrupt descriptor descriptor, and the 32≤ i ≤255(i≠128)
instruction is executed for the interrupt vector between push $i-256,jmp common_interrupt
. The do_IRQ()
function is then called to break the vector to the irq_desc[]
structure of the subscript, get action
the pointer, and then call the handler
Interrupt service program that points to.
From what has been described above, it is not difficult to see the entire interrupted process, as shown in 4:
Fig. 4:x86
One of the authors of this paper has conducted a situational analysis of the 2.6.10 interrupt system, and interested readers can get in touch with the author and obtain relevant information.
Back to top of page
Interrupt binding--interrupt affinity (IRQ Affinity)
In the SMP architecture, we can set the CPU affinity (CPU affinity) by invoking a system call and a set of related macros to bind one or more processes to one or more processors to run. The interruption also has no weakness in this respect and has the same characteristics. Interrupt affinity refers to the binding of one or more interrupt sources to a specific CPU to run. The interrupt affinity was originally designed and implemented by Ingo Molnar.
In the /proc/irq
directory, for a hardware device that has already registered an interrupt handler, there is a directory named after the interrupt number in that directory IRQ#
, a file in the IRQ#
directory smp_affinity
(the SMP architecture is the file), a bit mask for the CPU, The affinity that can be used to set the interrupt, by default 0xffffffff
, indicates that interrupts are sent to all CPUs for processing. If the interrupt controller is not supported IRQ affinity
, this default value cannot be changed, and all CPU bit masks cannot be turned off, i.e. cannot be set to 0x0
.
We use the network card (eth1, interrupt number 44) For example, on a server with 8 CPUs to set the affinity of the network card interrupt (the following data from the kernel source code Documentation\IRQ-affinity.txt
):
[[email protected] 44]# cat smp_affinityffffffff[[email protected] 44]# echo 0f > smp_affinity[[email protected] 44]# cat smp_affinity0000000f[[email protected] 44]# ping-f HPING Hell ( 195.4.7.3): Data bytes ...---Hell ping statistics---6029 packets transmitted, 6027 packets received, 0% packet Lossrou Nd-trip Min/avg/max = 0.1/0.1/0.4 ms[[email protected] 44]# cat/proc/interrupts | grep 44:44:0 1785 1785 1783 1783 1 1 0 io-apic-level eth1[[email protected] 44]# echo F0 > S Mp_affinity[[email protected] 44]# ping-f hping Hell (195.4.7.3): data bytes. ---Hell ping Statistics---2779 packets transmitted, 2777 packets received, 0% packet lossround-trip Min/avg/max = 0.1/0. 5/585.4 ms[[email protected] 44]# cat/proc/interrupts | grep 44:44:1068 1785 1785 1784 1784 1069 1070 1069 Io-apic-level eth1[[email protected] 44]#
In the above example, we first only allow the network card interrupt on the cpu0~3, and then run the ping program, it is not difficult to find that the network card interrupt is not processed on the cpu4~7. Then only on the cpu4~7 on the network card interrupt processing, cpu0~3 do not do any processing of the network card interrupt, after running the ping program, it /proc/interrupts
is not difficult to see the cpu4~7 on the number of interrupts significantly increased, and the number of interrupts on cpu0~3 not much change.
Before discussing the implementation principle of interrupt affinity, we first understand the composition in the I/O APIC.
The I/O APIC consists of a set of 24 IRQ lines, a 24-Item interrupt redirection table (Interrupt redirection table), a programmable register, and an information unit that sends and receives APIC information via the APIC bus. Where the interrupt affinity is related to the interrupt redirection table, each item in the Interrupt redirection table can be programmed individually to indicate the interrupt vector and priority, the target processor, and the way the processor is selected .
With table 2, it is not difficult to find that the maximum difference between the 8259A and APIC interrupt controllers is hw_interrupt_type
the last item of the type variable. For the 8259A type, it is set_affinity
set to NULL
, and for the APIC type of SMP, set_affinity
is assigned the value set_ioapic_affinity
.
During system initialization, for SMP architectures, functions are called setup_IO_APIC_irqs()
to initialize the I/O APIC chip, and 24 of the interrupt redirection tables in the chip are populated. During system startup, all CPUs execute setup_local_APIC()
functions to complete the local APIC initialization. When an interrupt is triggered, the value in the corresponding Interrupt redirection table is converted to a message, and then the message is sent to one or more local APIC cells via the APIC bus, so that interrupts can be delivered immediately to a specific CPU, or to a set of CPUs, or to all CPUs, to achieve the interrupt affinity.
When we write the CPU mask into a file through the cat command smp_affinity
, the call Roadmap is: write()
sys_write()
vfs_write()
proc_file_write()
irq_affinity_write_proc()
set_affinity()
set_ioapic_affinity()
set_ioapic_affinity_irq()
io_apic_write()
which, when the function is called, set_ioapic_affinity_irq()
takes the interrupt number and CPU mask as parameters, then continues the call io_apic_write()
, modifies the value in the corresponding interrupt redirection to complete the interrupt affinity setting. When the ping command, the network card interrupt is triggered, resulting in an interrupt signal, the multi-APIC system according to the value of the interrupt redirection table, according to the arbitration mechanism, select a CPU in the cpu0~3, and the signal to the corresponding local APIC, the local APIC and interrupt its CPU, the whole event does not Communicated to all other CPUs.
Back to top of page
New feature Outlook-in-line disconnection process (Interrupt Threads)
In the field of embedded, the industry's demand for Linux real-time is increasing, and it is imperative to transform the interruption. In Linux, interrupts have the highest priority. Whenever an interrupt event occurs, the kernel immediately executes the appropriate interrupt handler and waits until all pending interrupts and soft interrupts are processed to perform normal tasks, which can cause real-time tasks to be processed in a timely manner. The interrupt will run as a kernel thread and be given a different real-time priority, and the real-time task can have a higher priority than the thread break path. In this way, real-time tasks with the highest priority can be prioritized, with real-time guarantees even under severe load.
Currently, newer Linux 2.6.17 do not support the disconnection process. However, the real-time patches designed and implemented by Ingo Molnar enable interrupt threading. The latest is:
Http://people.redhat.com/~mingo/realtime-preempt/patch-2.6.17-rt9
The following is a brief analysis of the in-line disconnection process.
During the initialization phase, the interrupt initialization of the thread break is basically the same as the regular interrupt initialization, in which start_kernel()
all the functions are called trap_init()
and init_IRQ()
two functions are initialized to initialize the irq_desc_t
structure, and the different points are mainly embodied in the init
interrupt of thread breaking in init_hardirqs(kernel/irq/manage.c (the patch mentioned above) to create a kernel thread for each IRQ, with a maximum real-time priority of 50, and so on until 25, so that the lowest real-time priority for any IRQ thread is 25.
void __init Init_hardirqs (void) {... for (i = 0; i < Nr_irqs; i++) {irq_desc_t *desc = Irq_desc + i;if (desc->action &A mp;&! (Desc->status & Irq_nodelay)) Desc->thread = kthread_create (DO_IRQD, desc, "IRQ%d", IRQ); ......}} static int do_irqd (void * __desc) { ... */* scale IRQ thread priorities from Prio to prio */param.sched_priority = Curr_irq_prio;if (param.sched_priority >) Curr_irq_prio = param.sched_priority-1; ......}
If a irq_nodelay in the status bit of an interrupt number is set, the interrupt cannot be threaded.
In the interrupt processing phase, the similarities and differences between the two are mainly reflected in: the same part of the two is when an interrupt occurs, the CPU will call the do_IRQ()
function to handle the corresponding interrupt, do_IRQ()
after doing the necessary related processing after the call __do_IRQ()
. The biggest difference between the two is reflected in the function, in which the function __do_IRQ()
will determine whether the interrupt has been threaded (if the interrupt Descriptor's state field does not contain IRQ_NODELAY
a flag, the interrupt is threaded), and the function will be called directly for non-threaded interrupts handle_IRQ_event()
.
fastcall notrace unsigned int __do_irq (unsigned int IRQ, struct pt_regs *regs) {... if (REDIRECT_HARDIRQ (DESC)) Goto out_no_e Nd;......action_ret = Handle_irq_event (IRQ, regs, action); int Redirect_hardirq (struct Irq_desc *desc) {... if (!hardirq_preemption | | (Desc->status & irq_nodelay) | | !desc->thread) return 0;......if (Desc->thread && desc->thread->state! = task_running) wake_up_process (Desc->thread);
For cases that have been threaded, the calling wake_up_process()
function wakes up the interrupt processing thread and starts running, and the kernel thread will call do_hardirq()
to handle the corresponding interrupt, which will determine if there is an interrupt that needs to be handled, if any, handle_IRQ_event()
to handle the call. The interrupt handler handle_IRQ_event()
will be called directly to the appropriate interrupt handler.
It is not difficult to see whether a thread threaded or a non-threaded interrupt will eventually execute a handle_IRQ_event()
function to invoke the corresponding interrupt handler function, except that the threaded interrupt handler is executed in the kernel thread.
Not all interrupts can be threaded, such as clock interruption, mainly used to maintain the system time and timer, where the timer is the pulse of the operating system, once threaded, it may be suspended, so the consequences will be unimaginable, so should not be threaded. If an interrupt needs to be processed in real time, it can SA_NODELAY
declare itself non-threaded like a clock break, such as:
static struct Irqaction irq0 = {timer_interrupt, Sa_interrupt | Sa_nodelay, Cpu_mask_none, "timer", NULL, NULL};
Among them, the SA_NODELAY
IRQ_NODELAY
conversion between, is done in the setup_irq()
function.
Back to top of page
Interrupts under load balancing-SMP architecture
The implementation of interrupt load balancing is mainly encapsulated in the arch\ arch\i386\kernel\io-apic.c
file. If the option is configured when the kernel is compiled CONFIG_IRQBALANCE
, interrupt load balancing in the SMP architecture will be present in the kernel as a module.
Late_initcall (balanced_irq_init); #define Late_initcall (FN) module_init (FN) //include\linux\init.h
In the balanced_irq_init()
function, a kernel thread is created to interrupt load balancing:
static int __init balanced_irq_init (void) { ... printk (kern_info "Starting balanced_irq\n"); if (Kernel_thread ( BALANCED_IRQ, NULL, Clone_kernel) >= 0) return 0;else printk (kern_err "balanced_irq_init:failed to spawn Balanced_irq" ); ......}
In the balanced_irq()
function, every 5hz=5s time, the function is called once do_irq_balance()
, and the migration is interrupted. The interrupts on the heavy-duty CPU are migrated to the more idle CPUs for processing.
Back to top of page
Summarize
The Linux kernel's performance in SMP and real-time performance is becoming more and more satisfying with the interruption affinity and the sequential implementation of the interrupt thread, and there is every reason to believe that in the near future, the threaded will be merged into the baseline version. In this paper, the analysis of the process of disconnection is only a starting role, when the new features released, not let people feel confused.
- Note 1: Polling is not useless, such as NAPI, which is a classic case of a combination of polling and interrupts.
Linux Kernel Interrupt Insider