Linux Kernel network packet reception process

Source: Internet
Author: User

Ext: 1190000008836467

This article describes how a data packet is transferred from a NIC to a process in the hands of a Linux system in a single step.

If the English is not a problem, it is strongly recommended to read the following two articles in the reference, which is described in more detail.

This article discusses only the physical network card of Ethernet, does not involve virtual devices, and takes the receiving process of a UDP packet as an example.

The function call relationships listed in this example come from kernel 3.13.0, if your kernel is not this version, the function name and the associated path may be different, but the underlying principle should be the same (or slightly different)

Nic to memory

The network card needs to have the driver to work, the driver is loaded into the kernel module, is responsible for connecting the network card and the kernel module, the driver when loading the time registers oneself in the network module, when the corresponding network card receives the packet, the net module invokes the corresponding driver processing data.

Shows how the packet (packet) enters memory and is processed by the kernel's network module:

+-----+                   ||memroy+--------+ 1 ||2 DMA +--------+--------+--------+--------+|Packet |-------->|NIC |------------>|Packet |Packet |Packet |...... | +--------+ || | <--------+ +-----+ | |  +---------------+ |  | 3 |  Raise IRQ |  Disable IRQ |  5 | |  |↓|  +-----+ +------------+ |  |  Run IRQ Handler |  | |  CPU | ------------------>|  NIC Driver | |  |  4 |  | +-----+ +------------+ |  6 |  Raise soft IRQ |↓           
    • 1: the packet enters the physical NIC from the outside network. If the destination address is not the NIC and the NIC does not turn on promiscuous mode, the packet is discarded by the NIC.

    • 2: The NIC writes the packet through DMA to the specified memory address, which is assigned and initialized by the network card driver. Note: The old NIC may not support DMA, but the new NIC is generally supported.

    • 3: The NIC notifies the CPU through a hardware interrupt (IRQ), telling it that there is data coming

    • 4: The CPU calls the registered interrupt function according to the interrupt table, and the interrupt function is transferred to the corresponding function in the driver (NIC Driver)

    • 5: The driver first disables the network card interrupt, indicates that the driver has already known the memory has the data, tells the network card to receive the packet the next time to write the memory directly, does not notice the CPU again, this can improve the efficiency, avoids the CPU to stop being interrupted.

    • 6: start soft interrupt. At the end of this step, the hardware interrupt handler function is returned. Because the hard interrupt handler cannot be interrupted during execution, if it takes too long to execute, it can cause the CPU to fail to respond to other hardware interrupts, so the kernel introduces a soft interrupt, which moves the time-consuming part of the hard interrupt processing function into the soft interrupt handler to be processed slowly.

Network module of the kernel

Soft interrupts trigger soft interrupt handling functions in the kernel network module, followed by the following process

+-----+ 17 || +----------->|NIC | ||| |Enable IRQ +-----+ | |+------------+ Memroy ||Read +--------+--------+--------+--------+ +--------------->|NIC Driver |<---------------------|Packet |Packet |Packet |...... | |||9 +--------+--------+--------+--------+ |+------------+ |||SKB Poll |8 Raise SoftIRQ |6 +-----------------+ ||10 | |↓↓+---------------+ call +-----------+ +------------------+ +--------------------+ + +---------------------+ |net_rx_action |<-------|KSOFTIRQD || Napi_gro_receive | ------->|  Enqueue_to_backlog | ----->|  CPU Input_pkt_queue | +---------------+ 7 +-----------+ +------------------+ one +----------- ---------+ +---------------------+ |  |  13 14 |  +--------------------------------------+↓↓+--------------------------+ +---------- --------------+ |  __netif_receive_skb_core | ----------->|  Packet taps (af_packet) | +--------------------------+ +------------------------+ | |  16↓+-----------------+ |  protocol layers | +-----------------+         
  • 7: The KSOFTIRQD process in the kernel is dedicated to the processing of soft interrupts, when it receives a soft interrupt, it will invoke the corresponding soft interrupt corresponding to the processing function, for the 6th step above is the NIC driver module thrown soft interrupt, KSOFTIRQD will call the network module net_rx_ Action function

  • 8: Net_rx_action calls the poll function in the NIC driver to process the packet one by one

  • 9: in the pool function, the driver will read the NIC one after another to write to the in-memory packet, the format of the in-memory packet is only driven to know

  • The driver converts the in-memory packet to the SKB format recognized by the kernel Network module, and then calls the Napi_gro_receive function

  • One : napi_gro_receive handles the GRO-related content, which is to merge the packets that can be merged, so that only one protocol stack is called. Then determine if RPS is turned on, and if it is turned on, it will call Enqueue_to_backlog

  • in the Enqueue_to_backlog function, the packet is placed in the input_pkt_queue of the CPU's softnet_data structure, and then returned, if Input_pkt_queue is full, The packet will be discarded and the queue size can be configured by Net.core.netdev_max_backlog

  • The CPU will then process the network data in its own input_pkt_queue in the context of its soft interrupt (call __netif_receive_skb_core)

  • If the rps,napi_gro_receive is not turned on , it will call __netif_receive_skb_core directly.

  • A : see if there is a af_packet type of socket (which is what we often call the original socket), and if so, copy a piece of data to it. Tcpdump grab bag is to catch the bag here.

  • : call the corresponding function of the protocol stack and give the packet to the protocol stack processing.

  • : When all the packets in memory are processed (that is, the poll function completes), the hard interrupt of the NIC is enabled so that the CPU will be notified the next time the NIC receives the data.

The Enqueue_to_backlog function is also called by the NETIF_RX function, and Netif_rx is the function that is called when the LO device sends the packet

Protocol stack IP Layer

Because it is a UDP packet, the first step is to go to the IP layer, and then the function of one level is lowered:

|| ↓promiscuous mode && +--------+ packet_otherhost (set by driver) +-----------------+ |IP_RCV |-------------------------------------->|Drop this packet| +--------+                                       +-----------------+          || ↓+---------------------+|nf_inet_pre_routing |+---------------------+ || ↓+---------+ ||Enabled IP ForWord +------------+ +----------------+ | routing | -------------------->|  Ip_forward | ------->|  Nf_inet_foward | |  |  +------------+ +----------------+ +---------+ | |  | |  destination IP is local↓↓+---------------+ +------------------+ |  Dst_output_sk | |  Ip_local_deliver |  +---------------+ +------------------+ | | ↓+------------------+ |  Nf_inet_local_in | +------------------+ |  |↓+-----------+ |  UDP Layer | +-----------+          
    • IP_RCV: The IP_RCV function is the entry function of the IP module, in which the first thing is to discard the garbage packet (the destination MAC address is not the current network card, but is received because the NIC is set up promiscuous mode), and then the call is registered in the nf_inet_ Functions on the Pre_routing

    • nf_inet_pre_routing: NetFilter placed in the protocol stack, you can inject some packet processing function through iptables, to modify or discard the packet, if the packet is not discarded, will continue to go down

    • Routing: routing, if the destination IP is not a local IP, and does not turn on the IP forward function, then the packet will be discarded, if the IP forward function is turned on, it will enter the Ip_forward function

    • Ip_forward: Ip_forward will call the NetFilter registered Nf_inet_forward correlation function First, if the packet is not discarded, then continue to call Dst_output_sk function

    • Dst_output_sk: The function calls the corresponding function of the IP layer to send the data packets, the same as the second half of the packet sending process to be described.

    • ip_local_deliver: If the above routing found that the destination IP is a local IP, then the function will be called, in this function, will call the nf_inet_local_in related hook program, if passed, Packets will be sent down to the UDP layer

UDP layer
  |          | ↓+---------+ +-----------------------+ |  UDP_RCV | ----------->|       __UDP4_LIB_LOOKUP_SKB | +---------+            +-----------------------+          |  |↓+--------------------+ +-----------+ |  SOCK_QUEUE_RCV_SKB | ----->|  Sk_filter | +--------------------+ +-----------+ |  |↓+------------------+ |  __skb_queue_tail | +------------------+ |  |↓+---------------+ |  Sk_data_ready | +---------------+         
    • UDP_RCV: The UDP_RCV function is the entry function of the UDP module, it will call other functions, mainly to do some necessary checks, one of the important calls is __UDP4_LIB_LOOKUP_SKB, The function will find the corresponding socket according to the destination IP and port, if the corresponding socket is not found, then the packet will be discarded, otherwise continue

    • SOCK_QUEUE_RCV_SKB: Two main things, one is to check the socket's receive buffer is not full, if full, discard the packet, and then call Sk_ Filter to see if the package is a package that satisfies the condition, and if the filter is set on the current socket and the package does not satisfy the condition, the packet will be discarded (in Linux, Each socket can be defined as a filter within the tcpdump, and packets that do not meet the criteria will be discarded.

    • __skb_queue_tail: placing the packet at the end of the socket receive queue

    • Sk_data_ready: Notifies the socket that the packet is ready.

After the Sk_data_ready is called, a packet processing is completed, waiting for the application-level program to read, and all of the above functions are executed in the context of the soft interrupt.

Socket

The application layer generally has two ways to receive data, one is that the recvfrom function blocks there waiting for data, in this case, when the socket is notified, Recvfrom will be awakened, and then read the data from the receiving queue The other is to listen to the corresponding socket via Epoll or SELECT, and then call the Recvfrom function to read the data of the receiving queue when the notification is received. In both cases, the corresponding packets can be received properly.

Conclusion

Understanding the receiving process of a packet helps us figure out where we can monitor and modify the packet, and in which cases the packet may be discarded, providing some reference for us to handle the network problem, and knowing the location of the corresponding hooks in the netfilter. It is helpful to understand the usage of iptables, and it will also help us to understand the network virtual devices under Linux in a better future.

In the next few articles, the network virtual appliances and iptables under Linux will be introduced.

Linux Kernel network packet reception process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.