Analysis of core network packet receiving and transmitting process
Being able to receive real network traffic is a significant advantage of core, which makes it easy for existing systems to be emulated by accessing virtual networks. Core to the network device virtual is through the LXC technology to achieve, while the virtual network is through the virtual network card (Veth), Bridge (bridge), quagga to achieve. This document is mainly to understand the core network simulation by analyzing the network data transfer process in core.
Topological structure
To facilitate the description, the process of data flow from network card eth0 to virtual node N2 is analyzed with the example of topology structure shown in 1.
Figure 1 Example topology
Virtual network creation is implemented by the core backend based on the topology and configuration of the foreground, executing the corresponding commands as follows:
#创建容器n1
/usr/sbin/vnoded-v-c/tmp/pycore.52385/n1-l/tmp/pycore.52385/n1.log-p/tmp/pycore.52385/n1.pid-c/tmp/ Pycore.52385/n1.conf
#创建容器n2
/usr/sbin/vnoded-v-c/tmp/pycore.52385/n2-l/tmp/pycore.52385/n2.log-p/tmp/pycore.52385/n2.pid-c/tmp/ Pycore.52385/n2.conf
#创建网桥br. 38079.56309
/usr/sbin/brctl ADDBR b.38079.56309
#关闭spanning Tree Protocol and forwarding delay and start the network Bridge
/usr/sbin/brctl STP b.38079.56309 off
/usr/sbin/brctl setfd b.38079.56309 0
/SBIN/IP Link Set b.38079.56309 up
#创建网络转发过滤器ebtables, rules for receiving and forwarding
/sbin/ebtables-n b.38079.56309-p ACCEPT
/sbin/ebtables-a FORWARD--logical-in b.38079.56309–j b.38079.56309
#通过vcmd执行, turn off multicast probing
echo ' 0 ' >/sys/devices/virtual/net/b.38079.56311/bridge/multicast_snooping
#创建虚拟网卡n1. eth0.44
/SBIN/IP link Add name n1.eth0.44 type Veth Peer name n1.0.44
#将虚拟网卡加入到容器的命名空间中, and named Eth0
/SBIN/IP Link Set n1.0.44 netns 17122 #17122为节点n1的vnoded进程的ID
/SBIN/IP Link Set n1.0.44 name eth0
/SBIN/IP Link Set n1.eth0.44 up
#将虚拟网卡绑定到网桥上
/usr/sbin/brctl addif B. 38079.56309 n1.eth0.44
/SBIN/IP Link Set n1.eth0.44 up
#通过vcmd执行, set the network card MAC address
/SBIN/IP Link Set dev eth0 address 00:00:00:aa:00:00 #node内执行
/SBIN/IP addr Add 10.0.0.1/24 dev eth0 #node内执行
/SBIN/IP Link set eth0 up #node内执行
#通过vcmd执行, start the routing service in container N1
SH quaggaboot.sh zebra
SH quaggaboot.sh OSPFD
SH quaggaboot.sh Vtysh
It is not difficult to find the sample topology internal structure as Figure 2.
Figure 2 Example topology internal structure
In the example topology, the core creates two containers (namespaces), two bridges, three virtual NIC pairs (one end is in the container), and the virtual routing layer in the namespace, in order to receive data, the virtual network card in the namespace is connected to the bridge, and the bridge can connect to the physical network card or other virtual network card. This realizes the interconnection between the virtual node and the physical network card.
Data Flow Analysis
As an example of a topological structure, the first step is to receive the packet, which is completed in the hardware interrupt response of the NIC. In order to handle as quickly as possible, the network card receive interrupt response function will generate the received data SKB structure, will be called NETIF_RX (SKB) to insert the task in a soft interrupt to the CPU scheduling queue, and then immediately return. In the second step, KSOFTIRQD handles soft interrupts. This process, KSOFTIRQD calls different response functions according to the soft break type. In this example, KSOFTIRQD will call the NETIF_RECEIVE_SKB function to process the SKB package. NETIF_RECEIVE_SKB can be understood to receive packets from the physical layer, so it is also considered a link layer entry function. In this function, the Handle_bridge function is called, the SKB is given to the network card for processing (if the bridge is configured), Br_handle_frame_hook is called. The third step, the bridge will be based on the Ebtables rules for SKB forwarding, broadcasting, discarding and other operations. The ebtables rule defined in core is forwarding, so the bridge forwards Ebtables to another (virtual) network card connected to it, calling Veth_xmit. The fourth step, for the virtual network card, the transmission is received, so in Veth_xmit, it will skb re-inserted in a soft interrupt to the CPU scheduling queue, at this time SKB has been running in the container. Fifth, KSOFTIRQD calls the NETIF_RECEIVE_SKB function, and the NETIF_RECEIVE_SKB function calls the Packet_type member because the bridge is not configured in the container. Func = IP_RCV sends the packet to the L3 layer routing layer. Sixth, the routing layer receives SKB from IP_RCV, calls Ip_rcv_finish to Route SKB (calls Skb_rtable), decides to receive or forward SKB locally.
Figure 3 Data stream receive and forward process
Each component
Physical NIC Eth0
The physical NIC, also called the Ethernet card, behaves as Ethernet_device in the kernel (abbreviated as ETH0). Its implementation is provided by the NIC driver. The NIC is both a PCI device and a network device. It needs to register the interrupt response function Xxx_interrupt when it is turned on.
REQUEST_IRQ (DEV->IRQ, &xxx_interrupt,
Lp->shared_irq? irqf_shared:0, Dev->name,
(void *) dev)
When the packet is received, the network card interrupt is triggered, and the NIC driver requests the SKB for the incoming packet and reads the data from the NIC into the SKB and sends it to the upper layer of the kernel stack via NETIF_RX.
if (RTL8169_RX_VLAN_SKB (TP, DESC, SKB) < 0)
Netif_rx (SKB);
Dev->stats.rx_bytes + = Pkt_size;
dev->stats.rx_packets++;
Here it is necessary to say SKB, which is used to store network packets, structures. The SKB is generated by the network card. Because it contains the Net_device, the protocol type, the address and so on. Each layer can be processed according to its own business logic when encountering SKB.
Figure 4 SKB Structure
Bridges Bridge
Bridge (bridge) is a L2 layer (data link layer) device that enables bridging between Ethernet devices. A bridge can bind several Ethernet interface devices (using the Brctl addif command) to interconnect the network card.
(1) Module initialization
In Linux, the bridge is completed by dynamically loading the module bridge. When the module is initialized, the most important thing is to register the bridge with the system in addition to completing its own module initialization. The hook will be called in the Handle_bridgge.
Br_handle_frame_hook = Br_handle_frame;
(2) Binding with NIC
Fig. 4 Net_bridge Structural Body
The bridge and Nic bindings are implemented by binding to the bridge port. As shown in 4, the Net_bridge structure maintains a list of net_bridge_port structures, and Net_device (included in the ETH device structure) has a pointer to the Net_bridge_port struct, which can point to net_ Bridge_port elements in a linked list of structures.
(3) Bridging processing
Bridge processing is to receive data on the one hand, and send data on the other. In this case, the data is not sent through the bridge and is not discussed here. Receive data starting from Br_handle_frame .
Const unsigned char *dest = ETH_HDR (SKB)->h_dest;
Int (*rhook) (struct sk_buff *skb);
Determine if a valid physical address, not a full 0 address, and a non-broadcast address
if (!is_valid_ether_addr (ETH_HDR (SKB)->h_source)) goto drop;
Determine if the SKB package is shared skb->users! = 1, if it is, copy one, or return directly
SKB = Skb_share_check (SKB, gfp_atomic);
if (!SKB) return NULL;
Const unsigned char *dest = ETH_HDR (SKB)->h_dest;
Int (*rhook) (struct sk_buff *skb);
Determine if a valid physical address, not a full 0 address, and a non-broadcast address
if (!is_valid_ether_addr (ETH_HDR (SKB)->h_source)) goto drop;
Determine if the SKB package is shared skb->users! = 1, if it is, copy one, or return directly
SKB = Skb_share_check (SKB, gfp_atomic);
if (!SKB) return NULL;
This function is to determine if the link is a local multicast address, 01:80:c2:00:00:0x
if (Unlikely (Is_link_local (dest))) {
/* Pause frames shouldn ' t be passed up by driver anyway */
if (Skb->protocol = = htons (eth_p_pause)) goto drop;
/* If STP is turned off, then forward */
if (p->br->stp_enabled = = BR_NO_STP && dest[5] = = 0) goto forward;
if (Nf_hook (Pf_bridge, nf_br_local_in, SKB, Skb->dev,
NULL, br_handle_local_finish)) return null; /* Frame consumed by filter */
else return SKB; /* Continue processing */
}
Forward
Switch (p->state) {
Case br_state_forwarding:
If the bridge is in a forwarding state and the message is to be forwarded L3 the layer, it is returned directly
The Br_should_route_hook hook function is set to the Ebt_broute function in Ebtable, which determines whether the message is forwarded through the L3 layer according to the user's rules; general Rhook is empty
Rhook = Rcu_dereference (Br_should_route_hook);
if (rhook! = NULL) {
if (Rhook (SKB))
return SKB;
Dest = ETH_HDR (SKB)->h_dest;
}
/* Fall through */
Case br_state_learning:
If the destination MAC address of the packet is the MAC address of the virtual bridge device, it is marked host
if (!compare_ether_addr (P->BR->DEV->DEV_ADDR, dest))
Skb->pkt_type = Packet_host;
Call the hook function that the bridge is mounted at nf_br_prerouting,
Nf_hook (Pf_bridge, nf_br_pre_routing, SKB, Skb->dev, NULL,
Br_handle_frame_finish);
Break
Default
Drop
KFREE_SKB (SKB);
}
return NULL;
Forwarding and learning are bridge states, and bridge ports generally have 5 states:
1) Disable disabled by admin
2) Blcok rest, do not participate in packet forwarding
3) Listening monitoring
4) Learning Learning ARP information, ready to change to working state
5) forwarding normal operation, forwarding packets
(4) Bridge filter Ebtables_filter
The Network Bridge provides configurable filtering and forwarding rules, which are implemented through Ebtables_filter. Both Ebtables_filter,ebtables and Xt_tables are loadable modules. Ebtables is used to store filter rules, and each filter rule corresponds to a table entry in Ebtables. As shown in 5, various hook functions are registered to bridge when the Ebtables_filter module is initialized. Bridge invokes the corresponding hook function when it accepts, sends and forwards the packet, and implements the filtering function.
Fig. 5 Net_filter Structural Body
Virtual network card pair Veth pair
To communicate with the nodes in the container, the core creates a pair of virtual NIC pairs for each node and moves one end into the container.
#将虚拟网卡对
/SBIN/IP link Add name n1.eth0.44 type Veth Peer name n1.0.44
#将虚拟网卡对的一端加入到容器的命名空间中, and named Eth0
/SBIN/IP Link Set n1.0.44 netns 17122 #17122为节点n1的vnoded进程的ID
The virtual network card is the NIC driver that removes the hardware-related operation. Unlike a physical network card, it does not have the ability to receive external data, it can only be sent to it by other modules, and is implemented by calling Veth_xmit.
Stats->tx_bytes + = length;
stats->tx_packets++;
Rcv_stats->rx_bytes + = length;
rcv_stats->rx_packets++;
Netif_rx (SKB);
In the Veth_xmit implementation, Veth will also increase its own receive and send data statistics, and call NETIF_RX (SKB) to the SKB packet to the upper processing.
Packet loss Analysis
When the core transmits packets in the virtual network, it is done through the soft interrupt mechanism of Linux. When data traffic is particularly large, the soft interrupt daemon (KSOFTIRQD) load can be large, sometimes up to 100%, when a packet loss can occur. This kind of packet loss and network protocol will not be the same as the discard bad packets after the format and check, which is the bottleneck of core network simulation system. Let's take a look at the NETIF_RX function.
int Netif_rx (struct sk_buff *skb)
{
struct Softnet_data *queue;
unsigned long flags;
/* If Netpoll wants it, pretend we never saw it */
if (Netpoll_rx (SKB)) return net_rx_drop;
if (!skb->tstamp.tv64) Net_timestamp (SKB); /* Timestamp processing */
Local_irq_save (flags); /* Save current interrupt state, off interrupt */
Queue = &__get_cpu_var (Softnet_data); /* Timestamp processing */
__get_cpu_var (Netdev_rx_stat). total++;
if (Queue->input_pkt_queue.qlen <= netdev_max_backlog) {
if (Queue->input_pkt_queue.qlen) {/* Insert Schedule Queue */
Enqueue
__skb_queue_tail (&queue->input_pkt_queue, SKB); /* Insert Schedule Queue */
Local_irq_restore (flags); /* Open Interrupt */
return net_rx_success;
}
Napi_schedule (&queue->backlog); /* Wake KSOFTIRQD for queue processing */
Goto Enqueue;
}
__get_cpu_var (Netdev_rx_stat). dropped++;
Local_irq_restore (flags); /* Open Interrupt */
KFREE_SKB (SKB);
return net_rx_drop;
}
The NETIF_RX function joins the SKB package to the Softnet_data queue of the current CPU, and drops the packet operation if the team is listed as full. The state of the packet drops can be viewed through the Netdev_rx_stat state of the CPU.
Analysis of core network packet receiving and transmitting process