In-depth understanding of Linux network Technology insider-interrupts and network drivers

Source: Internet
Author: User

Notifies the driver when a frame is receivedin a network environment, when a device (network card) receives a data frame, it needs to notify the driver to process it. There are several notification mechanisms:Polling :The kernel keeps checking to see if the device has something to say. (More resource-intensive, but in some cases the best method)
Interruption:when a particular event occurs, the device driver indicates that the device has a hardware interrupt on behalf of the kernel, and the kernel interrupts other activities to meet the device's needs. Most network drivers use interrupts.
processing multiple frames during interrupts:interrupts are notified, and the driver executes. The frame is then received (loaded) until the input queue reaches the specified number, or until it has been done knowing that the queue is emptied, or elapsed for a specified time.
Timer-driven interrupt eventsThe driver specifies that the device generates interrupt events on a regular basis (the driver is active, not the device active, unlike the previous interrupt). The processing function then processes the frames that were reached since the last drive. This mechanism causes the delay of the frame processing, for example, the specified time is 100ms, and the frame may be at 0ms, 50ms, or 100ms just arrived, the average delay is 50ms.
Combination mechanismuse interrupts under low-traffic loads
Timer-driven interrupts with high-flow loads

advantages and disadvantages of interrupts:interruption inlow-flow load is a good choice, but in a high-traffic load situation, because not received a frame on a single interrupt, it is easy for the CPU to deal with the interruption of waste time, or even crash.
the code that is responsible for receiving frames is divided into two parts (actually the upper half of the interrupt, the lower half of the function). The upper function copies the frames to the output queue and performs some other non-preemptive work. The content of the lower part of the function is that the kernel processes the frames in the input queue (the frames are passed to the specific protocol processing). Because the upper part function can preempt the execution of the lower half function, under high traffic load, it is possible that the upper half function is executed, while the lower half function is shelved, which causes the input queue to overflow and the system crashes.

Interrupt handler functionWhy is there a lower half of the functionsimply put, the lower part of the function exists because the interrupt is not preemptive. And if we spend too much time dealing with an outage, it could cause other interruptions to be delayed. To do this, we divide the interrupt handler into the upper half function and the lower part function. The upper function mainly executes the non-preemptive content in the interrupt handler (such as copying the frame from the device to the input queue), and the lower function performs the content that can be preempted (such as the processing of the frames to the respective protocol).
the upper half of the function exclusive CPU resource execution, the lower part of the function can be executed by other interrupts to preempt CPU resources. With the next half of the function, the interrupt handler model is as follows:
1) The device sends a signal to the CPU, notifies the interrupt event
2) CPU off interrupt, perform upper half function
3) Upper part function Execution
4) The upper part of the function is completed, the CPU is interrupted, and the next half function is executed .
The main contents of the upper part function processing include:
a) Save all information about the interrupt event that the kernel will process later to Ram
b) Set the identity and know what to do with the interrupt after the kernel, and how to handle
c) Open interrupt,

lower half function solutionThe kernel provides a variety of lower-half functions of the solution, mainly the old-fashioned lower half, micro-task, soft IRQ three kinds. The difference between the different solutions is mainly the operating environment and concurrency and locking.
1) Old-fashioned lower half: At any moment, only one old-fashioned lower-half function can be executed (regardless of how many CPUs)2) Micro-task:    at any moment, each CPU, only one micro-task instance can be executed. (choice in most cases)
3) Soft IRQ:    at any one time, only one instance of a CPU per soft IRQ can run. (Selection of network tasks that require timely response, such as transceiver frames )
/***********************LINUX-2.6.32************************************///INCLUDE/LINUX/HARDIRQ.HIN_IRQ ()// When the CPU is serving a hardware interrupt, returning TRUEIN_SOFTIRQ ()//CPU when it is serving a software outage, returns True if the Truein_interrupt ()//CPU is serving a hardware interrupt or software interrupt, or if the preemption function is turned off//arch/ X86/include/asm/hardirq.hlocal_softirq_pending ()//returns True if at least one IRQ in the local CPU is in the pending state//include/linux/interrupt.h__ Raise_softirq_irqoff ()//sets the identifier associated with the soft IRQ, marks the IRQ as pending Raise_softirq_irqoff ()//__raise_softirq_irqoff wrap function, when in_ Interrupt is False, Wake KSOFTIRQDRAISE_SOFTIRQ ()//package Raise_softirq_irqoff, call Raise_softirq_irqoff before closing interrupt//kernel/softir Q.c__local_bh_enable ()//Open the lower half of the local CPU local_bh_enable ()//If any soft IRQ is pending and In_interrupt returns false, INVOKE_SOFTIRQLOCAL_BH _disable ()//Turn off the lower half of the CPU//include/linux/irqflags.hlocal_irq_enable ()//Turn on local CPU interrupt function local_irq_disable ()//Turn off Local CPU interrupt function L Ocal_irq_save ()//First save the local CPU interrupt state, then shut down Local_irq_restore ()//restore the interrupt state before the local CPU, restore the Local_irq_save saved interrupt information//include/linux   /SPINLOCK.HSPIN_LOCK_BH ()//Get swing lock, close lower half and preemption function Spin_unlock_bh () Release the swing lock and restart the lower half preemption function 



preemption featureThe kernel after Linux2.5 implements a full preemption (preemptitle) function (i.e. the kernel itself can also be preempted). Sometimes, however, the kernel performs tasks that do not want to be preempted (for example, it is serving hardware) and then needs to turn off preemption. The following are several functions related to the management of preemption functions.
Inculde/linux/preempt.hpreempt_disable ()          //Turn off preemption for the current task. Repeatable call, Increment reference counter preempt_enable ()           //preemption function is turned on again (need to check if the reference counter is 0) Preempt_enable_no_resch ()  //Decrement reference counter, Only the reference counter is 0 o'clock, the preemption function can be switched on Again preempt_check_resched ()    //called by preempt_enable, checking if the reference counter is 0.//Arch/x86/include/asm/thread _info.hstruct thread_info {    ...    int         Preempt_count;  /* 0 = preemptable,                           <0 = BUG *//preemption counter, specifies whether the process can be preempted ...    };


The lower half of the functionThe underlying structure of the lower half function has the following parts:
1) Classification: The lower half of the function into the appropriate type2) Association: Registration (registration) The relation between the type of the lower half function and its processing function3) Scheduling: Scheduling for the lower half of the function to prepare for execution4) Notification: Notification of the existence of kernel BH
legacy Lower half function (linux-2.2 previously)The old-fashioned lower-half function model (such as the linux-2.2 version) puts the lower half into many types, as follows:
enum {    timer_bh = 0,    console_bh,    tqueue_bh,    digi_bh,    serial_bh,    riscom8_bh,    Specialix _BH,    aurora_bh,    esp_bh,    net_bh,      //The lower half of the network    SCSI_BH,    immediate_bh,    keyboard_bh,    Cyclades_bh,    CM206_BH,    js_bh,    macserial_bh,    isicom_bh};


The various types and their processing functions are associated with INIT_BH (), such as the lower half of the networkAssociation in Net_dev_init
_ _initfunc (int net_dev_init (void)) {    ...    INIT_BH (NET_BH, NET_BH);    ... ... ...}


Interrupt handler to trigger the lower half function, use MARK_BH to set the flag bit in the global bitmap bh_active
extern inline void mark_bh (int nr) {    set_bit (nr, &bh_active);};


If a network device receives a frame, it calls Netif_rx to notify the kernel, copies the frame to the input queue backlog, and then marks the bottom half of the NET_BH:
Skb_queue_tail (&backlog, SKB); Mark_bh (NET_BH); return
Introducing a soft IRQa soft IRQ was introduced into the Linux kernel after the linux-2.4 version. (a soft IRQ can be considered a multithreaded version of an IRQ)
There are several types of new soft IRQ (there are only six linux-2.4, which are developed later):
include/linux/interrupt.henum{    hi_softirq=0,     //high-priority Micro task    TIMER_SOFTIRQ,    NET_TX_SOFTIRQ,  // Network soft IRQ    NET_RX_SOFTIRQ,  //network soft IRQ    BLOCK_SOFTIRQ,    Block_iopoll_softirq,    Tasklet_softirq,  //Low Priority micro-task soft IRQ    SCHED_SOFTIRQ,    Hrtimer_softirq,    RCU_SOFTIRQ,/    * Preferable RCU should always being the last SOFTIRQ */     Nr_softirqs};      
A soft IRQ can only be run by one instance on a CPU.
for this reason, each of the soft IRQ types maintains aAn array of type Softnet_data, the size of the array is the number of CPUs, and each CPU maintains a soft IRQ that should be of type the data structure of the softnet_data.
/* * Incoming packets is placed on PER-CPU queues so that * no locking is needed. */struct softnet_data{    struct qdisc        *output_queue;      Qdisc is the shorthand for queueing discipline, which is the queuing rule, the QoS. Here is the control of the output frame.    struct Sk_buff_head input_pkt_queue;    The input frame is saved in this queue before it is driven, (not applicable to Napi driver, Napi has its own private queue)    struct list_head    poll_list;          Represents the list of devices to be processed by the input frame.     struct Sk_buff      *completion_queue;  Represents a linked list of frames that have been successfully passed out.     struct napi_struct  backlog;            To be compatible with non-NAPI drivers.                                                                    };

Initialize in Net_dev_init
static int __init net_dev_init (void) {    ...    FOR_EACH_POSSIBLE_CPU (i) {        struct softnet_data *queue;         Queue = &per_cpu (Softnet_data, i);        Skb_queue_head_init (&queue->input_pkt_queue);        Queue->completion_queue = NULL;        Init_list_head (&queue->poll_list);         Queue->backlog.poll = Process_backlog;        Queue->backlog.weight = weight_p;        Queue->backlog.gro_list = NULL;        Queue->backlog.gro_count = 0;    }    ......}


registration of soft IRQ in scheduling mechanismthe registration and scheduling mechanism for soft IRQ is similar to the old model, except that the function is not the same.
correspondinginit_bh (), soft IRQ uses SPEN_SOFTIRQ () to register the relationship between the soft IRQ type and its associated function.
Kernel/softirq.cvoid Open_softirq (int nr, void (*action) (struct softirq_action *))                            {    Softirq_vec[nr]. action = Action;}

The soft IRQ is dispatched on the local CPU through the following functions, ready for execution:
__raise_softirq_irqoff ()  //sets the identity associated with the soft IRQ, marks the IRQ as pending Raise_softirq_irqoff ()    //__raise_softirq_irqoff wrap function, When In_interrupt is False, Wake KSOFTIRQDRAISE_SOFTIRQ ()           //Package Raise_softirq_irqoff, turn off interrupt before calling Raise_softirq_irqoff
Soft IRQ specific execution reference other posts DO_IRQSCHECULEDO_SOFTIRQ reference other posts

Micro Taskmicro-tasks are built on the basis of a soft IRQ. Corresponding to the soft IRQHI_SOFTIRQ (High-priority micro-task) andTASKLET_SOFTIRQ (Normal priority micro-task).
two copies per CPUTasklet_struct table, a copy of the correspondingHi_softirq, a copy of the correspondingTASKLET_SOFTIRQ.
/* * tasklets */struct tasklet_head{    struct tasklet_struct *head;                                                                                                            struct tasklet_struct **tail;}; Static define_per_cpu (struct Tasklet_head, Tasklet_vec), static define_per_cpu (struct tasklet_head, Tasklet_hi_vec);


There are some features of the micro task (the difference from the old lower half function) 1) There is no limit to the number of micro tasks, but base_ Each identifier of BH is limited to one type of lower half function 2) the micro-task provides two levels of priority 3) different micro tasks can be run on different CPUs on a colleague 4) The Micro task is dynamic relative to the old-fashioned lower half, and does not need to be statically declared in the XXX_BH or XXX_SOFTIRQ enumeration list
struct tasklet_struct{    struct tasklet_struct *next;  Link the structure associated to the same CPU                                                                                                       unsigned long state;          The bitmap identifier, whose possible value is enumerated by tasklet_state_xxx    atomic_t count;               Counter, 0 indicates that the micro task is closed and is not executable. Non 0 indicates that the micro task has been opened    Void (*func) (unsigned long);  The function to execute    unsigned long data;           The parameters of the above function};  enum{    tasklet_state_sched,/    * Tasklet are scheduled for execution */    tasklet_state_run   /Tasklet is running (SMP only) */};




In-depth understanding of Linux network Technology insider-interrupts and network drivers

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.