Analysis of data packet sending process in two-layer (link layer)
Analysis of data packet sending process in two-layer (link layer)
--lvyilong316
Description: The kernel version covered in this series of posts is 2.6.32
When the upper layer is ready for a package, the link layer packet is sent to the link layer, which is handled primarily by the Dev_queue_xmit function. Packets can be sent in two types, one is the normal transmission process, that is, through the network card driver, the other is through a soft interrupt (see note 3). For ease of understanding, first look at the overall call graph of the DEV_QUEUE_XMI function.
Ldev_queue_xmit
This function is used to add the sent SKB to a dev queue, which must be set to the SKB device and priority before calling the function, which can be called in the context of the interrupt.
return value:
Returning a non-0 (positive or negative) indicates a function error, and returning 0 indicates success, but does not indicate that the packet was successfully sent because the packet could be discarded for reasons such as speed limit.
The incoming SKB will be freed after the function executes, so if you want to control the packet, you need to increase the SKB reference count when you implement the retransmission of the SKB.
When this function is called, the interrupt must be open, because bhenable must require irqenable, otherwise it will cause a deadlock.
- int dev_queue_xmit (struct sk_buff *skb)
- {
- struct Net_device *dev = skb->dev;
- struct Netdev_queue *txq;
- struct Qdisc *q;
- int rc =-enomem;
- /* GSO would handle the following emulations directly. */
- if (Netif_needs_gso (Dev, SKB))
- Goto GSO;
- if (Skb_has_frags (SKB) &&
- ! (Dev->features & Netif_f_fraglist) &&
- __skb_linearize (SKB))
- Goto OUT_KFREE_SKB;
- If SKB has shards but the sending device does not support sharding, or shards have shards in high-end memory but the sending device does not support DMA, all segments need to be re-assembled into a single segment, where __skb_linearize is actually __pskb_pull_tail (SKB, skb- >data_len), this function is basically equivalent to Pskb_may_pull, Pskb_may_pull is to detect whether the SKB corresponding to the main buf there is enough space to pull out len length, If not enough, reassign the SKB and copy the data in the frags into the newly allocated master buff, where the parameter len is set to Skb->datalen, which is to complete the SKB linearization by copying all the data into the main buff.
- if (Skb_shinfo (SKB)->nr_frags &&
- (! (Dev->features & Netif_f_sg) | | ILLEGAL_HIGHDMA (Dev, skb) &&
- __skb_linearize (SKB))
- Goto OUT_KFREE_SKB;
- If the packet is not computed checksum and the sending device does not support the checksum of this Protocol, the checksum is computed here (note 1). If the above has been linearized once, here the __skb_linearize will be directly returned, pay attention to the difference between frags and Frag_list, the former is to put more data on a separate allocation of the page, Sk_buff only one. While the latter is connected to multiple Sk_buff
- if (skb->ip_summed = = checksum_partial) {
- Skb_set_transport_header (SKB, Skb->csum_start-
- Skb_headroom (SKB));
- if (!dev_can_checksum (Dev, skb) && skb_checksum_help (SKB))
- Goto OUT_KFREE_SKB;
- }
- Gso
- Turn off soft interrupts and disable CPU preemption
- RCU_READ_LOCK_BH ();
- Select a Send queue, if the device provides a Select_queue callback function to use it, otherwise by the kernel Select a queue, this is only the implementation of the Linux kernel multi-queue, but to really use the queue, you need to support multi-queue network card, the general network card has only one queue. Set the number of queues when calling Alloc_etherdev assignment Net_device Yes
- Txq = Dev_pick_tx (dev, SKB);
- Obtaining the Qdisc of the equipment from the NETDEV_QUEUE structure
- Q = rcu_dereference (Txq->qdisc);
- If the device has a queue available, call __DEV_XMIT_SKB
- if (q->enqueue) {
- rc = __DEV_XMIT_SKB (SKB, Q, Dev, txq);
- Goto out;
- }
- The following processing is in the absence of a send queue, the soft device generally does not send a queue: such as Lo, tunnle; all we have to do is call the driver's hard_start_xmit and send it out if the send fails, because there is no queue to save it.
- if (Dev->flags & iff_up) {//Determine if the device is turned on
- int cpu = SMP_PROCESSOR_ID (); /* OK because BHs is off */
- if (Txq->xmit_lock_owner! = CPU) {//is on the same CPU
- Hard_tx_lock (Dev, txq, CPU);
- if (!netif_tx_queue_stopped (TXQ)) {//OK queue is running state
- rc = net_xmit_success;
- if (!dev_hard_start_xmit (SKB, Dev, txq)) {
- Hard_tx_unlock (Dev, txq);
- Goto out;
- }
- }
- Hard_tx_unlock (Dev, txq);
- if (Net_ratelimit ())
- PRINTK (kern_crit "Virtual device%s asks to"
- "Queue packet!\n", dev->name);
- } else {//Txq->xmit_lock_owner = = CPU condition, indicating recursion occurred
- if (Net_ratelimit ())
- PRINTK (Kern_crit "Dead loop on virtual device"
- "%s, Fix It urgently!\n", dev->name);
- }
- }
- rc =-enetdown;
- RCU_READ_UNLOCK_BH ();
- OUT_KFREE_SKB:
- KFREE_SKB (SKB);
- return RC;
- Out
- RCU_READ_UNLOCK_BH ();
- return RC;
- }
L__dev_xmit_skb
The __DEV_XMIT_SKB function mainly does two things:
(1) If the flow control object is empty, an attempt is made to send the packet directly.
(2) If the flow control object is not empty, add the packet to the flow control object and run the flow control object.
- static inline int __dev_xmit_skb (struct sk_buff *skb, struct Qdisc *q,
- struct Net_device *dev,
- struct Netdev_queue *txq)
- {
- spinlock_t *root_lock = Qdisc_lock (q);//See note 2
- int RC;
- Spin_lock (Root_lock); Lock Qdisc
- if (Unlikely (Test_bit (__qdisc_state_deactivated, &q->state))) {//Determine if the queue is invalid
- KFREE_SKB (SKB);
- rc = Net_xmit_drop;
- } else if ((Q->flags & Tcq_f_can_bypass) &&!qdisc_qlen (q) &&
- !test_and_set_bit (__qdisc_state_running, &q->state)) {
- /*
- * This is a work-conserving queue; There is no old skbs
- * Waiting to being sent out; And the Qdisc is not running-
- * XMit the SKB directly.
- */
- __qdisc_update_bstats (q, Skb->len);
- if (Sch_direct_xmit (SKB, Q, Dev, txq, root_lock))
- __qdisc_run (q);
- Else
- Clear_bit (__qdisc_state_running, &q->state);
- rc = net_xmit_success;
- } else {
- rc = Qdisc_enqueue_root (SKB, q);
- Qdisc_run (q);
- }
- Spin_unlock (Root_lock);
- return RC;
- }
Lqdisc_run
There are two timing calls to Qdisc_run ():
1. __DEV_XMIT_SKB ()
2. Soft Interrupt Service Thread NET_TX_SOFTIRQ
- static inline void Qdisc_run (struct qdisc *q)
- {
- if (!test_and_set_bit (__qdisc_state_running, &q->state))//Set queue to run state
- __qdisc_run (q);
- }
L__qdisc_run
- void __qdisc_run (struct qdisc *q)
- {
- unsigned long start_time = jiffies;
- while (Qdisc_restart (q)) {//return value greater than 0 indicates that the flow control object is not empty
- /* If the queue is found to be running too long, the queue will be stopped and the queue will be added to the Output_queue chain header
- * Postpone processing if (deferred processing)
- * 1. Another process needs the CPU;
- * 2. We ' ve been doing it for too long.
- */
- if (need_resched () | | jiffies! = start_time) {//has not been allowed to continue running this flow control object
- __netif_schedule (q); Add this qdisc to the Output_queue linked list for each CPU variable Softnet_data
- Break
- }
- }
- Clear the run identity of the queue
- Clear_bit (__qdisc_state_running, &q->state);
- }
Loop call Qdisc_restart Send the data, the following function Qdisc_restart is the function that actually sends the packet, it takes a frame off the queue, and then tries to send it out, if the sending fails, it is generally re-queued.
The return value of this function is: Return the remaining queue length on successful send, return 0 if send fails (if send is successful and the remaining queue length is 0 also returns 0)
Lqdisc_restart
The __qdisc_state_running State guarantees that only one CPU at a time is processing this qdisc,qdisc_lock (q) To ensure sequential access to this queue.
Usually netif_tx_lock is used to ensure the sequential (exclusive) access of the device driver, and Qdisc_lock (q) is used to guarantee the sequential access of the QDISC, two mutually exclusive, and one must release the other.
- static inline int Qdisc_restart (struct qdisc *q)
- {
- struct Netdev_queue *txq;
- struct Net_device *dev;
- spinlock_t *root_lock;
- struct Sk_buff *skb;
- /* Dequeue Packet */
- SKB = DEQUEUE_SKB (q); Call the Dequeue function from the beginning
- if (unlikely (!SKB))
- return 0; Returns 0 indicates that the queue is empty or is restricted
- Root_lock = Qdisc_lock (q);
- dev = Qdisc_dev (q);
- Txq = Netdev_get_tx_queue (Dev, skb_get_queue_mapping (SKB));
- Return Sch_direct_xmit (SKB, Q, Dev, txq, root_lock); Used to send packets
- }
Lsch_direct_xmit
Send a SKB, set the queue to the __qdisc_state_running state, ensure that only one CPU runs this function, return 0 indicates that the queue is empty or send is restricted, greater than 0 indicates that the queue is non-empty.
- int sch_direct_xmit (struct sk_buff *skb, struct Qdisc *q,
- struct Net_device *dev, struct netdev_queue *txq,
- spinlock_t *root_lock)
- {
- int ret = netdev_tx_busy;
- Spin_unlock (Root_lock);//Release Qdisc, because the device lock is to be acquired later
- Call __netif_tx_lockàspin_lock (&txq->_xmit_lock, which guarantees device-driven exclusive access
- Hard_tx_lock (Dev, Txq, smp_processor_id ());
- if (!netif_tx_queue_stopped (TXQ) &&//device is not stopped and the send queue is not frozen
- !netif_tx_queue_frozen (TXQ))
- ret = Dev_hard_start_xmit (SKB, Dev, txq); Sending a packet
- Hard_tx_unlock (Dev, txq); Call __netif_tx_unlock
- Spin_lock (Root_lock);
- Switch (ret) {
- Case NETDEV_TX_OK://If the device successfully sends the data packets
- ret = Qdisc_qlen (q); Returns the remaining queue length
- Break
- Case netdev_tx_locked://Get device lock failed
- ret = Handle_dev_cpu_collision (SKB, TXQ, q);
- Break
- Default://Device busy, re-queued to send (using SOFTIRQ)
- if (unlikely (ret! = netdev_tx_busy && net_ratelimit ()))
- PRINTK (kern_warning "BUG%s code%d Qlen%d\n",
- Dev->name, ret, q->q.qlen);
- ret = DEV_REQUEUE_SKB (SKB, q);
- Break
- }
- if (Ret && netif_tx_queue_stopped (TXQ) | |
- Netif_tx_queue_frozen (TXQ)))
- ret = 0;
- return ret;
- }
Ldev_hard_start_xmit
- int dev_hard_start_xmit (struct sk_buff *skb, struct Net_device *dev,
- struct Netdev_queue *txq)
- {
- const struct Net_device_ops *ops = dev->netdev_ops;
- int RC;
- if (likely (!skb->next)) {
- As can be seen from here, for each sent package will also be sent to Ptype_all a copy, and packet socket created for proto for Eth_p_all will register a member in Ptype_all, so the protocol number is ETH_P_ All of the packet sockets, the data sent and received can be
- if (!list_empty (&ptype_all))
- Dev_queue_xmit_nit (SKB, Dev);
- if (Netif_needs_gso (Dev, skb)) {
- if (Unlikely (Dev_gso_segment (SKB)))
- Goto OUT_KFREE_SKB;
- if (Skb->next)
- Goto GSO;
- }
- If the sending device does not require SKB->DST, it is released here
- if (Dev->priv_flags & Iff_xmit_dst_release)
- Skb_dst_drop (SKB);
- Call the device registration send function, i.e. dev->netdev_ops-> ndo_start_xmit (SKB, Dev)
- rc = Ops->ndo_start_xmit (SKB, Dev);
- if (rc = = NETDEV_TX_OK)
- Txq_trans_update (TXQ);
- return RC;
- }
- Gso
- ......
- }
Ldev_queue_xmit_nit
- static void Dev_queue_xmit_nit (struct sk_buff *skb, struct net_device *dev)
- {
- struct Packet_type *ptype;
- #ifdef config_net_cls_act
- if (! ( Skb->tstamp.tv64 && (G_tc_from (skb->tc_verd) & at_ingress)))
- Net_timestamp (SKB); Record the timestamp of the packet input
- #else
- Net_timestamp (SKB);
- #endif
- Rcu_read_lock ();
- List_for_each_entry_rcu (PType, &ptype_all, list) {
- /* Never send packets back to the socket they originated from */
- Traverse the Ptype_all list to find all the original socket interfaces that match the input criteria and loop the packets into the set of interfaces that meet the criteria
- if ((Ptype->dev = = Dev | |!ptype->dev) &&
- (Ptype->af_packet_priv = = NULL | |
- (struct sock *) Ptype->af_packet_priv! = Skb->sk)) {
- Since the packet is an extra input to the original socket, it is necessary to clone a packet
- struct Sk_buff *skb2 = Skb_clone (SKB, gfp_atomic);
- if (!SKB2)
- Break
- /* SKB->NH should be correctly (ensure correct head offset)
- Set by sender, so, the second statement is
- Just protection against buggy protocols.
- */
- Skb_reset_mac_header (SKB2);
- if (Skb_network_header (SKB2) < Skb2->data | |
- Skb2->network_header > Skb2->tail) {
- if (Net_ratelimit ())//net_ratelimit is used to guarantee the frequency of PRINTK in the network code
- PRINTK (kern_crit "protocol%04x is"
- "Buggy, Dev%s\n",
- Skb2->protocol, Dev->name);
- Skb_reset_network_header (SKB2); Reset L3 Head Offset
- }
- Skb2->transport_header = skb2->network_header;
- Skb2->pkt_type = packet_outgoing;
- Ptype->func (Skb2, Skb->dev, PType, Skb->dev);//Invoke Protocol (Ptype_all) accept function
- }
- }
- Rcu_read_unlock ();
- }
Ring-back equipment
For loopback device loopback, the ops->ndo_start_xmit of the device is initialized to the Loopback_xmit function.
- static const struct Net_device_ops Loopback_ops = {
- . Ndo_init = Loopback_dev_init,
- . ndo_start_xmit= Loopback_xmit,
- . Ndo_get_stats = Loopback_get_stats,
- };
Drivers/net/loopback.c
- Static netdev_tx_t loopback_xmit (struct Sk_buff *skb,
- struct Net_device *dev)
- {
- struct Pcpu_lstats *pcpu_lstats, *lb_stats;
- int Len;
- Skb_orphan (SKB);
- Skb->protocol = Eth_type_trans (SKB, Dev);
- /* It's OK to use Per_cpu_ptr () because BHs is off */
- Pcpu_lstats = dev->ml_priv;
- Lb_stats = Per_cpu_ptr (Pcpu_lstats, smp_processor_id ());
- Len = skb->len;
- if (Likely (NETIF_RX (SKB) = = net_rx_success)) {//Call NETIF_RX directly for receive processing
- Lb_stats->bytes + = Len;
- lb_stats->packets++;
- } else
- lb_stats->drops++;
- return NETDEV_TX_OK;
- }
1.checksum_partial indicates that the validation of the pseudo-header using the hardware CHECKSUM,L4 layer has been completed and has been added to the Uh->check field, where only the device calculates the checksum value of the entire header 4-layer header.
2. The entire packet sending logic involves three code for mutually exclusive access:
(1) Spinlock_t*root_lock=qdisc_lock (q);
(2) test_and_set_bit (__qdisc_state_running,&q->state)
(3) __netif_tx_lockàspin_lock (&txq->_xmit_lock)
where (1) (3) corresponds to a spinlock, (2) corresponds to a queue status. When you understand how to use these three synchronization methods in your code, first look at the relationship of the relevant data structures, as follows.
The green part of the figure represents (1) (3) two spinlock. First, look at (1) the corresponding code:
- Static inline spinlock_t *qdisc_lock (struct Qdisc *qdisc)
- {
- Return &qdisc->q.lock;
- }
So Root_lock is the lock used to control SKB queue access in Qdisc, and locks are required when enqueue, dequeue, requeue are required for the SKB queue.
The __qdisc_state_running flag is used to ensure that a flow control object (QDISC) is not accessed by multiple CPUs at the same time.
The Spinlock at (3), or _xmit_lock in Structnetdev_queue, is used to guarantee mutually exclusive access to the registry function of Dev, that is, Deriver synchronization.
In addition, the kernel code notes write that (1) and (3) are mutually exclusive, to obtain a lock at (1) must first be guaranteed to release (3) at the lock, and vice versa, why do not want to understand .... Who the great God knows to look
3. There is already a dev_queue_xmit function, why do I need a soft interrupt to send it?
We can see that in Dev_queue_xmit the SKB has been processed (such as merging into a package, calculating checksums, etc.), and the processed SKB can be sent directly, when Dev_queue_ XMit will also first SKB the queue (SKB is generally in this function in the queue), and call Qdisc_run attempt to send, but it is possible to send a failure, this will be SKB re-queue, dispatch soft interrupt, and their own direct return.
A soft interrupt simply sends the SKB in the queue and releases the SKB that has been sent, eliminating the need to linearized or checksum the SKB. In addition, if the queue is stopped, Dev_queue_xmit can still join the package to the queue, but cannot send it, so that when the queue is awakened, a soft interrupt is required to send the backlog of packets during the stop. In short, Dev_queue_xmit is the SKB to do some final processing and the first attempt to send, soft interrupt is to send the former failed or not sent out packets sent out. (In fact, sending a soft interrupt also has a role, is to release the packets that have been sent, because in some cases the transmission is done in a hardware interrupt, in order to improve the efficiency of hardware interrupt processing, the kernel provides a way to release SKB into a soft interrupt, just call DEV_KFREE_SKB_IRQ, It will skb into the Softnet_data completion_queue and then turn on the send soft interrupt, Net_tx_action will release Completion_queue in SKB in a soft interrupt)
http://www.bkjia.com/PHPjc/1103186.html www.bkjia.com true http://www.bkjia.com/PHPjc/1103186.html techarticle two layer (link layer) packet sending process Analysis two layer (link layer) packet sending process Analysis--lvyilong316 Description: This series of blog covers the kernel version is 2.6.32 when the upper-level quasi ...