Original website: http://www.ibm.com/developerworks/cn/linux/l-cn-linuxkernelint/index.html
Brief Introduction: This article has carried on the thorough analysis and the discussion to the interrupt system, mainly includes the interruption controller, the interruption classification, the interruption affinity, the interrupt migration and so on in the disconnection process and SMP. Firstly, the principle of interruption is briefly analyzed, then the realization principle of interrupt affinity is discussed in detail, and the realization mechanism between disconnection and non threading interruption is analyzed in the end. Mark this article.
Release Date: May 14, 2007
Level: Intermediate
Access Status: 15,198 views
Comments: 2 (View | Add Comment-login) average score (39 ratings)
Score for this article
What is an interruption
There is no doubt that the Linux kernel needs to manage all the hardware devices connected to the computer. If you want to manage these devices, you have to communicate with them first, there are generally two solutions to achieve this function: polling (polling) so that the kernel periodically query the state of the device, and then make the appropriate processing; interrupt (interrupt) Let the hardware signal to the kernel when needed (the kernel is active for the hardware).
The first scenario makes the kernel do a lot of work, because polling periodically repeats, consumes CPU time, and is inefficient, so it's usually a second scenario. Note 1
From a physical standpoint, an interruption is an electrical signal generated by a hardware device and fed directly to an interrupt controller (such as 8259A) on an input pin, which then sends the corresponding signal to the processor by the interrupt controller. Once the processor detects the signal, it interrupts the work it is currently working on, instead of processing the interrupt. Thereafter, the processor notifies the OS that an interrupt has been generated. This allows the OS to properly handle the interrupt. Different devices have different interrupts, each with a unique digital ID, which is often referred to as an interrupt request line.
Back to the top of the page
APIC vs 8259A
The CPU of the X86 computer provides only two external pins for interrupts: NMI and INTR. Where NMI is an unshielded interrupt, it is usually used for power off and physical memory parity; Intr is a shielded interrupt that can be interrupted by setting the interrupt shield, which is primarily used to accept interrupt signals from external hardware that are passed to the CPU by the interrupt controller.
There are two common interrupt controllers:
1. Programmable Interrupt Controller 8259A
The traditional PIC (programmable Interrupt Controller) is connected by two 8259A-style external chips in a "cascading" manner. Each chip can handle up to 8 different IRQ. Because the IRQ2 pins from the pic's INT output line are connected to the main pic, the number of available IRQ lines reaches 15, as shown in Figure 1.
diagram 1:8259a-cascade schematic diagram
2. Advanced Programmable Interrupt Controller (APIC)
8259A is only suitable for single CPU, in order to fully exploit the parallelism of SMP architecture, it is important to pass interrupts to each CPU in the system. For this reason, Intel introduced a new component called I/O Advanced Programmable controller to replace the old 8259A programmable interrupt controller. The component consists of two main components: "local APIC", which is responsible for passing the interrupt signal to the specified processor; For example, a machine with three processors, it must have three local APIC relative to it. Another important part is I/O APIC, which collects Interrupt signals from the I/O appliance and sends a signal to the local APIC when those devices need to be interrupted, with a maximum of 8 I/O APIC in the system.
Each local APIC has a 32-bit register, an internal clock, a local timing device, and two additional IRQ lines LINT0 and LINT1 reserved for local interrupts. All local APIC are connected to I/O APIC to form a multi-level APIC system, as shown in Figure 2.
Figure 2: Multilevel I/O APIC system
Most Single-processor systems now contain an I/O APIC chip, which can be configured in two ways:
1) as a standard 8259A mode of operation. Local APIC is prohibited, external I/O APIC is connected to the CPU, two LINT0 and LINT1 are connected to INTR and NMI pins respectively.
2) as a standard external I/O APIC. The local APIC is activated and all external interrupts are received through I/O APIC.
To tell if a system is using I/O APIC, you can enter the following command at the command line:
# cat/proc/interrupts
CPU0
0: 90504 io-apic-edge timer
1: 131 Io-apic-edge i8042
8: 4 Io-apic-edge RTC
9: 0 io-apic-level ACPI
: 111 Io-apic-edge i8042
: 1862 io-apic-edge ide0 Io-apic-edge ide1
177: 9 io-apic-level eth0 185
: 0 Io-apic-level Via82cxxx ...
|
If Io-apic is listed in the output results, your system is using APIC. If you see xt-pic, it means your system is using a 8259A chip.
Back to the top of the page
Interrupt classification
Interrupts can be divided into synchronous (synchronous) interrupts and asynchronous (asynchronous) interrupts:
1. Synchronous interrupts are generated by the CPU control unit when the instruction is executed, and are called synchronizations because the CPU interrupts only after one instruction has been executed, rather than during the execution of code directives, such as system calls.
2. Asynchronous interrupts are those that are randomly generated by other hardware devices according to the CPU clock signal, meaning that interrupts can occur between instructions, such as keyboard interrupts.
According to Intel's official data, synchronous interrupts are called exceptions (Exception), and asynchronous interrupts are called interrupts (interrupt).
Interrupts can be divided into masked interrupts (maskable interrupt) and unshielded interrupts (nomaskable interrupt). Exceptions can be divided into fault (fault), traps (trap), termination (abort) three classes.
Broadly speaking, interrupts can be divided into four categories: interruptions , failures , traps , and terminations . See table 1 for similarities and differences between these categories.
Table 1: Interrupt categories and their behavior
category |
reason |
Asynchronous/Synchronous |
return Behavior |
Interrupt |
Signals from the I/O device |
Asynchronous |
Always return to the next instruction |
Trap |
Intentional anomaly. |
Synchronous |
Always return to the next instruction |
Fault |
Potentially recoverable errors |
Synchronous |
Return to current instruction |
Terminate |
Non-recoverable error |
Synchronous |
Will not return |
Each interrupt of the X86 architecture is given a unique number or vector (8-bit unsigned integer). The unshielded interrupt and the exception vector are fixed, and the masked interrupt vectors can be changed by programming the Interrupt controller.
Back to the top of the page
Introduction to the Linux 2.6 interrupt processing principle
The interrupt descriptor (Interrupt descriptor Table,idt) is a system table that is associated with each interrupt or exception vector, and each vector holds the entry address of the corresponding interrupt or exception handler in the table. The kernel must load the initialization address of the IDT table into the IDTR register and initialize each entry in the table before allowing interrupts to occur, that is, when the system initializes.
When in real mode, the IDT is initialized and used by the BIOS program. Once Linux begins to take over, however, the IDT is moved to another area of ARM and is initialized for a second time, because Linux does not use any BIOS program, and uses its own dedicated interrupt service (routine) (Interrupt service ROUTINE,ISR). Interrupts and exception handlers are much like regular C functions
There are three main data structures that contain all the information associated with IRQ: Hw_interrupt_type, irq_desc_t, and irqaction, Figure 3 explains how they relate to each other.
diagram 3:irq The relationship between structures
In the X86 system, for the 8259A and I/O APIC the two different types of interrupt controllers, the HW_INTERRUPT_TYPE structure is assigned different values, the specific difference is shown in table 2.
the difference between table 2:8259a and I/O APIC PIC
8259a |
i/o APIC |
static struct Hw_interrupt_type I8259a_irq_type = {"Xt-pic", startup_8259a_irq,shutdown_8259a_irq,e Nable_8259a_irq,disable_8259a_irq,mask_and_ack_8259a,end_8259a_irq,null}; |
static struct Hw_interrupt_type Ioapic_edge_type = {. TypeName = "Io-apic-edge",. Startup = Startup_edge_ioapic,. shutdown = shutdown_edge_ioapic,.enable = enable_edge_ioapic,.disable = Disable_edge_ioapic,.ack = Ack_edge_ioapic,. end = End_edge_ioapic,.set_affinity = set_ioapic_affinity,};static struct Hw_interrupt_type ioapic_level_type = {. TypeName = "Io-apic-level",. Startup = Startup_level_ioapic,.shutdown = Shutdown_level_ioapic,.enable = Enable_level_ ioapic,.disable = Disable_level_ioapic,.ack = Mask_and_ack_level_ioapic,.end = End_level_ioapic,.set_affinity = set_ Ioapic_affinity,}; |
In the interrupt initialization phase, a variable of type Hw_interrupt_type is invoked to initialize the handle member in the irq_desc_t structure. cascaded 8259A is used in earlier systems, so initialization is done with I8259a_irq_type, and for SMP systems, handle variables are initialized with Ioapic_edge_type or Ioapic_level_type.
For each peripheral, the interrupt handler is registered with the Linux kernel either statically (declared as a static type global variable) or dynamically (calling the REQUEST_IRQ function). Regardless of how it is registered, a irqaction structure (where handler points to the Interrupt service program) is declared or assigned, and then the SETUP_IRQ () function is invoked to associate irq_desc_t and irqaction.
When the interrupt occurs, IDT gets the interrupt service entry address through the interrupt descriptor, and a push $i -256,jmp common_interrupt instruction is executed for the interrupt vector between 32≤i≤255 (i≠128). It then calls the DO_IRQ () function, takes the irq_desc[of the interrupt vector, gets the pointer to the action, and then calls the interrupt service program that handler points to.
From the above description, it is not difficult to see the entire interrupt process, as shown in Figure 4:
Figure 4:x86 in the cut-off
One of the authors of this paper has conducted a situational analysis of the 2.6.10 interrupt system, and interested readers can contact the author to obtain relevant information.
Back to the top of the page
Interrupt binding--interrupt affinity (IRQ Affinity)
In the SMP architecture, we can set CPU affinity (CPU affinity) by invoking system calls and a set of related macros to bind one or more processes to one or more processors. Interruptions also have the same characteristics in this respect. Interrupt affinity means that one or more interrupt sources are bound to a specific CPU to run. The interrupt affinity was originally designed and implemented by Ingo Molnar.
In the/PROC/IRQ directory, for hardware devices that have registered interrupt handlers, there is a directory in the directory named irq#, the irq# directory has a smp_affinity file (the SMP architecture is the file), it is a CPU bitmask, The affinity that can be used to set the interrupt, the default value of 0xFFFFFFFF, indicates that the interrupt is sent to all CPUs for processing. If the interrupt controller does not support IRQ affinity, you cannot change this default value, and you cannot turn off all CPU bit masks, which cannot be set to 0x0.
We use the network card (eth1, interrupt number 44) as an example, in a server with 8 CPUs to set the affinity of the network card interrupt (the following data from the kernel source documentation\irq-affinity.txt):
[Root@moon 44]# cat smp_affinity
ffffffff
[Root@moon 44]# echo 0f > Smp_affinity
[root@moon 44]# cat Smp_a Ffinity
0000000f
[root@moon 44]# ping-f H
ping Hell (195.4.7.3): Data bytes
...
---Hell ping Statistics---
6029 packets transmitted, 6027 packets received, 0% packet loss round-trip
Ax = 0.1/0.1/0.4 ms
[Root@moon 44]# cat/proc/interrupts | grep: 0 1785 1785 1783< c17/>1783 1 1 0 io-apic-level eth1
[Root@moon 44]# echo F0 > Smp_affinity
[ Root@moon 44]# ping-f H
ping Hell (195.4.7.3): Data bytes
.
---Hell ping Statistics---
2779 packets transmitted, 2777 packets received, 0% packet loss round-trip
Max = 0.1/0.5/585.4 ms
[Root@moon 44]# cat/proc/interrupts | grep: 1068 1785 1785 1784 1784 1069 1070 1069 io-apic-level eth1
[Root@moon 44]#
|
In the example above, we only allow the network card interrupt on the cpu0~3, and then run the ping program, it is not difficult to find that the network card interrupt is not processed on the cpu4~7. Then only on the cpu4~7 on the network card interrupt processing, cpu0~3 without any processing of network card interruption, after running ping program, again to see/proc/interrupts file, it is not difficult to find cpu4~7 on the number of interrupts significantly increased, and cpu0~3 on the number of interrupts did not There's been a lot of change.
Before we explore the implementation principle of interrupt affinity, we first understand the composition of I/O APIC.
The I/O APIC consists of a set of 24 IRQ lines, a 24-Item interrupt redirection table (Interrupt redirection table), a programmable register, and an information unit that sends and receives APIC information through the APIC bus. The Interrupt redirection table is closely related to the interrupt affinity, and each entry in the Interrupt redirection table can be programmed separately to indicate the interrupt vector and priority, the target processor, and the way the processor is selected .
By table 2, it is not difficult to find that the biggest difference between the 8259A and APIC interrupt controllers is the last item of the Hw_interrupt_type type variable. For the 8259A type, set_affinity is set to NULL, and for the APIC type of SMP, Set_affinity is assigned to set_ioapic_affinity.
During system initialization, for the SMP architecture, the Setup_io_apic_irqs () function is invoked to initialize the I/O APIC chip, and 24 entries of the Interrupt redirection table in the chip are populated. During system startup, all CPUs perform the setup_local_apic () function, completing the local APIC initialization. When an interrupt is triggered, the value in the corresponding Interrupt redirection table is converted to a message. The message is then sent to one or more local APIC units via the APIC bus, so that interrupts can be delivered immediately to a particular CPU, or to a set of CPUs, or to all CPUs, to achieve a disruptive affinity.
When we write the CPU mask into the smp_affinity file through the cat command, the call roadmap at this point is: Write ()->sys_write ()->vfs_write ()->proc_file_write ()-> Irq_affinity_write_proc ()->set_affinity ()->set_ioapic_affinity ()->set_ioapic_affinity_irq ()->io_ Apic_write (), where the SET_IOAPIC_AFFINITY_IRQ () function is invoked, takes the interrupt number and CPU mask as parameters, and then continues to invoke Io_apic_write (), modifying the value in the corresponding interrupt redirection to complete the setting of the interrupt affinity. When the ping command is executed, network card interrupt is triggered, resulting in a interrupt signal, APIC system according to the Interrupt redirect table value, according to the arbitration mechanism, select a CPU in Cpu0~3, and pass the signal to the corresponding local APIC, local APIC and interrupt its CPU, the whole event does not Communicate to all other CPUs.
Back to the top of the page
Prospect of new features--disconnection process (Interrupt Threads)
In the embedded field, the industry's demand for real-time Linux is getting higher, and it is imperative to transform the interruption. In Linux, interrupts have the highest priority. Whenever an interrupt event is generated at any point, the kernel immediately executes the appropriate interrupt handler, and waits until all pending interrupts and soft interrupts are processed to perform normal tasks, which may result in immediate tasks not being processed in a timely manner. Interrupts are run as kernel threads and given different real-time priorities, and real-time tasks can have a higher priority than the thread threads. In this way, a real-time task with the highest priority can be prioritized, even if there is a real time guarantee under severe load.
Currently, newer Linux 2.6.17 do not support the disconnection process. However, the real-time patch designed and implemented by Ingo Molnar realizes interrupt threading. The latest download address is:
Http://people.redhat.com/~mingo/realtime-preempt/patch-2.6.17-rt9
The following is a brief analysis of the process of disconnection.
In the initialization phase, the interrupt initialization of a thread break is essentially the same as regular interrupt initialization, and the Trap_init () and INIT_IRQ () two functions are invoked in the Start_kernel () function to initialize the irq_desc_t structure. The main difference is that when the kernel initializes the init thread, the interrupt interruption in the init () function will also call Init_hardirqs (KERNEL/IRQ/MANAGE.C (the patch mentioned above) to create a kernel thread for each IRQ. The highest real-time priority is 50, and so on until 25, so that any IRQ thread has a minimum real-time priority of 25.
void __init Init_hardirqs (void)
{
...
.. for (i = 0; i < Nr_irqs i++) {
irq_desc_t *desc = Irq_desc + i;
if (Desc->action &&! ( Desc->status & Irq_nodelay))
Desc->thread = kthread_create (DO_IRQD, desc, "IRQ%d", IRQ);
...
}
}
static int do_irqd (void * __desc)
{
...
* * Scale IRQ thread priorities from Prio to Prio/param.sched_priority
= Curr_irq_prio;
if (param.sched_priority >)
curr_irq_prio = param.sched_priority-1;
...
}
|
The interrupt cannot be threaded if the irq_nodelay in the state bit of the interrupt number is set.
In the interrupt processing stage, the similarities and differences between the two are mainly reflected in: the same part is that when the interrupt occurs, the CPU will invoke the DO_IRQ () function to handle the corresponding interrupts, DO_IRQ () after the necessary related processing to invoke __DO_IRQ (). The biggest difference between the two is embodied in the __DO_IRQ () function, in this function, the interrupt is judged whether it has been threaded (if the state field of the interrupt descriptor does not contain the IRQ_NODELAY flag, the interrupt is threaded) and the handle is called directly for interrupts that are not threaded. _irq_event () function to handle.
fastcall notrace unsigned int __do_irq (unsigned int IRQ, struct pt_regs)
{
...
if (REDIRECT_HARDIRQ (desc))
goto out_no_end;
..... Action_ret = Handle_irq_event (IRQ, regs, action);
...
}
int Redirect_hardirq (struct Irq_desc *desc)
{
...
if (!hardirq_preemption | | (Desc->status & irq_nodelay) | | !desc->thread) return
0;
..... if (desc->thread && desc->thread->state!= task_running)
wake_up_process (desc->thread);
...
}
|
For situations that have already been threaded, call the Wake_up_process () function to wake the interrupt processing thread and start running, and the kernel thread will invoke DO_HARDIRQ () to handle the corresponding interrupt, which will determine if there is an interrupt to be processed and if so call Handle_irq_ Event () to process. Handle_irq_event () will call the appropriate interrupt handling function directly to complete the interrupt processing.
It is easy to see that, whether a line Chenghua or a threaded interrupt, the handle_irq_event () function is eventually executed to invoke the appropriate interrupt handler function, except that the threaded interrupt handler function is executed in the kernel thread.
Not all interrupts can be threaded, such as clock interruption, mainly used to maintain system time and timer, in which the timer is the pulse of the operating system, once threaded, it may be suspended, so the consequences will be disastrous, so should not be threaded. If an interrupt needs to be processed in real time, it can use the SA_NODELAY flag to declare itself as not threaded, as in the case of a clock interrupt, for example:
static struct Irqaction irq0 = {
Timer_interrupt, Sa_interrupt | Sa_nodelay, Cpu_mask_none, "timer", NULL, NULL
};
|
Among them, the conversion between Sa_nodelay and Irq_nodelay is done in the SETUP_IRQ () function.
Back to the top of the page
Interrupt load balancing-SMP under architecture
The implementation of interrupt load balancing is mainly encapsulated in the arch\ arch\i386\kernel\io-apic.c file. If the Config_irqbalance option is configured when the kernel is compiled, the interrupt load balancing in the SMP architecture will be in the kernel in the form of a module.
Late_initcall (balanced_irq_init);
#define Late_initcall (FN) module_init (FN) //include\linux\init.h
|
In the Balanced_irq_init () function, a kernel thread is created to interrupt load balancing:
static int __init balanced_irq_init (void)
{ ...
PRINTK (kern_info "Starting balanced_irq\n");
if (Kernel_thread (BALANCED_IRQ, NULL, Clone_kernel) >= 0) return
0;
else
printk (kern_err "balanced_irq_init:failed to spawn Balanced_irq");
...
}
|
In the BALANCED_IRQ () function, the do_irq_balance () function is called once every 5hz=5s time, and an interrupted migration is made. Migrate interrupts on the heavy load CPU to a more idle CPU for processing.
Back to the top of the page
Summarize
With the successive implementations of interrupt affinity and disconnection, the Linux kernel is becoming more and more satisfying in terms of SMP and real-time performance, and there is every reason to believe that in the near future, the Chenghua will be merged into the baseline version. In this paper, the analysis of the break-through process is only a starting role, when the new features are released, not to be confused. Note 1: Polling is also not useless, such as NAPI, is a combination of polling and interruption of the classic case.
Reference Rebert Love, Linux Kernel development,2rd Edition, Machinery Industry Press, 2006. Daniel p. Bovet,marco Cesati, "Understanding the Linux kernel,3rd Edition", Southeast University Press, 2006. Jonatban Corbet, Wei Yongming etc., "Linux device Driver", China Power Press, 2006. Gordon Fischer ET, "the Linux Kernel Prime", Machinery Industry Press, 2006. DeveloperWorks China website Linux Technology Zone