Two layer (link layer) packet sending process Analysis _php tutorial

Source: Internet
Author: User

Analysis of data packet sending process in two-layer (link layer)


Analysis of data packet sending process in two-layer (link layer)

--lvyilong316

Description: The kernel version covered in this series of posts is 2.6.32
When the upper layer is ready for a package, the link layer packet is sent to the link layer, which is handled primarily by the Dev_queue_xmit function. Packets can be sent in two types, one is the normal transmission process, that is, through the network card driver, the other is through a soft interrupt (see note 3). For ease of understanding, first look at the overall call graph of the DEV_QUEUE_XMI function.

Ldev_queue_xmit

This function is used to add the sent SKB to a dev queue, which must be set to the SKB device and priority before calling the function, which can be called in the context of the interrupt.

return value:

Returning a non-0 (positive or negative) indicates a function error, and returning 0 indicates success, but does not indicate that the packet was successfully sent because the packet could be discarded for reasons such as speed limit.

The incoming SKB will be freed after the function executes, so if you want to control the packet, you need to increase the SKB reference count when you implement the retransmission of the SKB.

When this function is called, the interrupt must be open, because bhenable must require irqenable, otherwise it will cause a deadlock.

 
 
  1. int dev_queue_xmit (struct sk_buff *skb)
  2. {
  3. struct Net_device *dev = skb->dev;
  4. struct Netdev_queue *txq;
  5. struct Qdisc *q;
  6. int rc =-enomem;
  7. /* GSO would handle the following emulations directly. */
  8. if (Netif_needs_gso (Dev, SKB))
  9. Goto GSO;
  10. if (Skb_has_frags (SKB) &&
  11. ! (Dev->features & Netif_f_fraglist) &&
  12. __skb_linearize (SKB))
  13. Goto OUT_KFREE_SKB;
  14. If SKB has shards but the sending device does not support sharding, or shards have shards in high-end memory but the sending device does not support DMA, all segments need to be re-assembled into a single segment, where __skb_linearize is actually __pskb_pull_tail (SKB, skb- >data_len), this function is basically equivalent to Pskb_may_pull, Pskb_may_pull is to detect whether the SKB corresponding to the main buf there is enough space to pull out len length, If not enough, reassign the SKB and copy the data in the frags into the newly allocated master buff, where the parameter len is set to Skb->datalen, which is to complete the SKB linearization by copying all the data into the main buff.
  15. if (Skb_shinfo (SKB)->nr_frags &&
  16. (! (Dev->features & Netif_f_sg) | | ILLEGAL_HIGHDMA (Dev, skb) &&
  17. __skb_linearize (SKB))
  18. Goto OUT_KFREE_SKB;
  19. If the packet is not computed checksum and the sending device does not support the checksum of this Protocol, the checksum is computed here (note 1). If the above has been linearized once, here the __skb_linearize will be directly returned, pay attention to the difference between frags and Frag_list, the former is to put more data on a separate allocation of the page, Sk_buff only one. While the latter is connected to multiple Sk_buff
  20. if (skb->ip_summed = = checksum_partial) {
  21. Skb_set_transport_header (SKB, Skb->csum_start-
  22. Skb_headroom (SKB));
  23. if (!dev_can_checksum (Dev, skb) && skb_checksum_help (SKB))
  24. Goto OUT_KFREE_SKB;
  25. }
  26. Gso
  27. Turn off soft interrupts and disable CPU preemption
  28. RCU_READ_LOCK_BH ();
  29. Select a Send queue, if the device provides a Select_queue callback function to use it, otherwise by the kernel Select a queue, this is only the implementation of the Linux kernel multi-queue, but to really use the queue, you need to support multi-queue network card, the general network card has only one queue. Set the number of queues when calling Alloc_etherdev assignment Net_device Yes
  30. Txq = Dev_pick_tx (dev, SKB);
  31. Obtaining the Qdisc of the equipment from the NETDEV_QUEUE structure
  32. Q = rcu_dereference (Txq->qdisc);
  33. If the device has a queue available, call __DEV_XMIT_SKB
  34. if (q->enqueue) {
  35. rc = __DEV_XMIT_SKB (SKB, Q, Dev, txq);
  36. Goto out;
  37. }
  38. The following processing is in the absence of a send queue, the soft device generally does not send a queue: such as Lo, tunnle; all we have to do is call the driver's hard_start_xmit and send it out if the send fails, because there is no queue to save it.
  39. if (Dev->flags & iff_up) {//Determine if the device is turned on
  40. int cpu = SMP_PROCESSOR_ID (); /* OK because BHs is off */
  41. if (Txq->xmit_lock_owner! = CPU) {//is on the same CPU
  42. Hard_tx_lock (Dev, txq, CPU);
  43. if (!netif_tx_queue_stopped (TXQ)) {//OK queue is running state
  44. rc = net_xmit_success;
  45. if (!dev_hard_start_xmit (SKB, Dev, txq)) {
  46. Hard_tx_unlock (Dev, txq);
  47. Goto out;
  48. }
  49. }
  50. Hard_tx_unlock (Dev, txq);
  51. if (Net_ratelimit ())
  52. PRINTK (kern_crit "Virtual device%s asks to"
  53. "Queue packet!\n", dev->name);
  54. } else {//Txq->xmit_lock_owner = = CPU condition, indicating recursion occurred
  55. if (Net_ratelimit ())
  56. PRINTK (Kern_crit "Dead loop on virtual device"
  57. "%s, Fix It urgently!\n", dev->name);
  58. }
  59. }
  60. rc =-enetdown;
  61. RCU_READ_UNLOCK_BH ();
  62. OUT_KFREE_SKB:
  63. KFREE_SKB (SKB);
  64. return RC;
  65. Out
  66. RCU_READ_UNLOCK_BH ();
  67. return RC;
  68. }

L__dev_xmit_skb

The __DEV_XMIT_SKB function mainly does two things:

(1) If the flow control object is empty, an attempt is made to send the packet directly.

(2) If the flow control object is not empty, add the packet to the flow control object and run the flow control object.

 
 
  1. static inline int __dev_xmit_skb (struct sk_buff *skb, struct Qdisc *q,
  2. struct Net_device *dev,
  3. struct Netdev_queue *txq)
  4. {
  5. spinlock_t *root_lock = Qdisc_lock (q);//See note 2
  6. int RC;
  7. Spin_lock (Root_lock); Lock Qdisc
  8. if (Unlikely (Test_bit (__qdisc_state_deactivated, &q->state))) {//Determine if the queue is invalid
  9. KFREE_SKB (SKB);
  10. rc = Net_xmit_drop;
  11. } else if ((Q->flags & Tcq_f_can_bypass) &&!qdisc_qlen (q) &&
  12. !test_and_set_bit (__qdisc_state_running, &q->state)) {
  13. /*
  14. * This is a work-conserving queue; There is no old skbs
  15. * Waiting to being sent out; And the Qdisc is not running-
  16. * XMit the SKB directly.
  17. */
  18. __qdisc_update_bstats (q, Skb->len);
  19. if (Sch_direct_xmit (SKB, Q, Dev, txq, root_lock))
  20. __qdisc_run (q);
  21. Else
  22. Clear_bit (__qdisc_state_running, &q->state);
  23. rc = net_xmit_success;
  24. } else {
  25. rc = Qdisc_enqueue_root (SKB, q);
  26. Qdisc_run (q);
  27. }
  28. Spin_unlock (Root_lock);
  29. return RC;
  30. }

Lqdisc_run

There are two timing calls to Qdisc_run ():

1. __DEV_XMIT_SKB ()

2. Soft Interrupt Service Thread NET_TX_SOFTIRQ

 
  
  
  1. static inline void Qdisc_run (struct qdisc *q)
  2. {
  3. if (!test_and_set_bit (__qdisc_state_running, &q->state))//Set queue to run state
  4. __qdisc_run (q);
  5. }

L__qdisc_run

 
  
  
  1. void __qdisc_run (struct qdisc *q)
  2. {
  3. unsigned long start_time = jiffies;
  4. while (Qdisc_restart (q)) {//return value greater than 0 indicates that the flow control object is not empty
  5. /* If the queue is found to be running too long, the queue will be stopped and the queue will be added to the Output_queue chain header
  6. * Postpone processing if (deferred processing)
  7. * 1. Another process needs the CPU;
  8. * 2. We ' ve been doing it for too long.
  9. */
  10. if (need_resched () | | jiffies! = start_time) {//has not been allowed to continue running this flow control object
  11. __netif_schedule (q); Add this qdisc to the Output_queue linked list for each CPU variable Softnet_data
  12. Break
  13. }
  14. }
  15. Clear the run identity of the queue
  16. Clear_bit (__qdisc_state_running, &q->state);
  17. }

Loop call Qdisc_restart Send the data, the following function Qdisc_restart is the function that actually sends the packet, it takes a frame off the queue, and then tries to send it out, if the sending fails, it is generally re-queued.

The return value of this function is: Return the remaining queue length on successful send, return 0 if send fails (if send is successful and the remaining queue length is 0 also returns 0)

Lqdisc_restart

The __qdisc_state_running State guarantees that only one CPU at a time is processing this qdisc,qdisc_lock (q) To ensure sequential access to this queue.

Usually netif_tx_lock is used to ensure the sequential (exclusive) access of the device driver, and Qdisc_lock (q) is used to guarantee the sequential access of the QDISC, two mutually exclusive, and one must release the other.

 
  
  
  1. static inline int Qdisc_restart (struct qdisc *q)
  2. {
  3. struct Netdev_queue *txq;
  4. struct Net_device *dev;
  5. spinlock_t *root_lock;
  6. struct Sk_buff *skb;
  7. /* Dequeue Packet */
  8. SKB = DEQUEUE_SKB (q); Call the Dequeue function from the beginning
  9. if (unlikely (!SKB))
  10. return 0; Returns 0 indicates that the queue is empty or is restricted
  11. Root_lock = Qdisc_lock (q);
  12. dev = Qdisc_dev (q);
  13. Txq = Netdev_get_tx_queue (Dev, skb_get_queue_mapping (SKB));
  14. Return Sch_direct_xmit (SKB, Q, Dev, txq, root_lock); Used to send packets
  15. }

Lsch_direct_xmit

Send a SKB, set the queue to the __qdisc_state_running state, ensure that only one CPU runs this function, return 0 indicates that the queue is empty or send is restricted, greater than 0 indicates that the queue is non-empty.

 
 
  1. int sch_direct_xmit (struct sk_buff *skb, struct Qdisc *q,
  2. struct Net_device *dev, struct netdev_queue *txq,
  3. spinlock_t *root_lock)
  4. {
  5. int ret = netdev_tx_busy;
  6. Spin_unlock (Root_lock);//Release Qdisc, because the device lock is to be acquired later
  7. Call __netif_tx_lockàspin_lock (&txq->_xmit_lock, which guarantees device-driven exclusive access
  8. Hard_tx_lock (Dev, Txq, smp_processor_id ());
  9. if (!netif_tx_queue_stopped (TXQ) &&//device is not stopped and the send queue is not frozen
  10. !netif_tx_queue_frozen (TXQ))
  11. ret = Dev_hard_start_xmit (SKB, Dev, txq); Sending a packet
  12. Hard_tx_unlock (Dev, txq); Call __netif_tx_unlock
  13. Spin_lock (Root_lock);
  14. Switch (ret) {
  15. Case NETDEV_TX_OK://If the device successfully sends the data packets
  16. ret = Qdisc_qlen (q); Returns the remaining queue length
  17. Break
  18. Case netdev_tx_locked://Get device lock failed
  19. ret = Handle_dev_cpu_collision (SKB, TXQ, q);
  20. Break
  21. Default://Device busy, re-queued to send (using SOFTIRQ)
  22. if (unlikely (ret! = netdev_tx_busy && net_ratelimit ()))
  23. PRINTK (kern_warning "BUG%s code%d Qlen%d\n",
  24. Dev->name, ret, q->q.qlen);
  25. ret = DEV_REQUEUE_SKB (SKB, q);
  26. Break
  27. }
  28. if (Ret && netif_tx_queue_stopped (TXQ) | |
  29. Netif_tx_queue_frozen (TXQ)))
  30. ret = 0;
  31. return ret;
  32. }

Ldev_hard_start_xmit

 
 
  1. int dev_hard_start_xmit (struct sk_buff *skb, struct Net_device *dev,
  2. struct Netdev_queue *txq)
  3. {
  4. const struct Net_device_ops *ops = dev->netdev_ops;
  5. int RC;
  6. if (likely (!skb->next)) {
  7. As can be seen from here, for each sent package will also be sent to Ptype_all a copy, and packet socket created for proto for Eth_p_all will register a member in Ptype_all, so the protocol number is ETH_P_ All of the packet sockets, the data sent and received can be
  8. if (!list_empty (&ptype_all))
  9. Dev_queue_xmit_nit (SKB, Dev);
  10. if (Netif_needs_gso (Dev, skb)) {
  11. if (Unlikely (Dev_gso_segment (SKB)))
  12. Goto OUT_KFREE_SKB;
  13. if (Skb->next)
  14. Goto GSO;
  15. }
  16. If the sending device does not require SKB->DST, it is released here
  17. if (Dev->priv_flags & Iff_xmit_dst_release)
  18. Skb_dst_drop (SKB);
  19. Call the device registration send function, i.e. dev->netdev_ops-> ndo_start_xmit (SKB, Dev)
  20. rc = Ops->ndo_start_xmit (SKB, Dev);
  21. if (rc = = NETDEV_TX_OK)
  22. Txq_trans_update (TXQ);
  23. return RC;
  24. }
  25. Gso
  26. ......
  27. }

Ldev_queue_xmit_nit

 
 
  1. static void Dev_queue_xmit_nit (struct sk_buff *skb, struct net_device *dev)
  2. {
  3. struct Packet_type *ptype;
  4. #ifdef config_net_cls_act
  5. if (! ( Skb->tstamp.tv64 && (G_tc_from (skb->tc_verd) & at_ingress)))
  6. Net_timestamp (SKB); Record the timestamp of the packet input
  7. #else
  8. Net_timestamp (SKB);
  9. #endif
  10. Rcu_read_lock ();
  11. List_for_each_entry_rcu (PType, &ptype_all, list) {
  12. /* Never send packets back to the socket they originated from */
  13. Traverse the Ptype_all list to find all the original socket interfaces that match the input criteria and loop the packets into the set of interfaces that meet the criteria
  14. if ((Ptype->dev = = Dev | |!ptype->dev) &&
  15. (Ptype->af_packet_priv = = NULL | |
  16. (struct sock *) Ptype->af_packet_priv! = Skb->sk)) {
  17. Since the packet is an extra input to the original socket, it is necessary to clone a packet
  18. struct Sk_buff *skb2 = Skb_clone (SKB, gfp_atomic);
  19. if (!SKB2)
  20. Break
  21. /* SKB->NH should be correctly (ensure correct head offset)
  22. Set by sender, so, the second statement is
  23. Just protection against buggy protocols.
  24. */
  25. Skb_reset_mac_header (SKB2);
  26. if (Skb_network_header (SKB2) < Skb2->data | |
  27. Skb2->network_header > Skb2->tail) {
  28. if (Net_ratelimit ())//net_ratelimit is used to guarantee the frequency of PRINTK in the network code
  29. PRINTK (kern_crit "protocol%04x is"
  30. "Buggy, Dev%s\n",
  31. Skb2->protocol, Dev->name);
  32. Skb_reset_network_header (SKB2); Reset L3 Head Offset
  33. }
  34. Skb2->transport_header = skb2->network_header;
  35. Skb2->pkt_type = packet_outgoing;
  36. Ptype->func (Skb2, Skb->dev, PType, Skb->dev);//Invoke Protocol (Ptype_all) accept function
  37. }
  38. }
  39. Rcu_read_unlock ();
  40. }

Ring-back equipment

For loopback device loopback, the ops->ndo_start_xmit of the device is initialized to the Loopback_xmit function.

 
  
  
  1. static const struct Net_device_ops Loopback_ops = {
  2. . Ndo_init = Loopback_dev_init,
  3. . ndo_start_xmit= Loopback_xmit,
  4. . Ndo_get_stats = Loopback_get_stats,
  5. };

Drivers/net/loopback.c

 
  
  
  1. Static netdev_tx_t loopback_xmit (struct Sk_buff *skb,
  2. struct Net_device *dev)
  3. {
  4. struct Pcpu_lstats *pcpu_lstats, *lb_stats;
  5. int Len;
  6. Skb_orphan (SKB);
  7. Skb->protocol = Eth_type_trans (SKB, Dev);
  8. /* It's OK to use Per_cpu_ptr () because BHs is off */
  9. Pcpu_lstats = dev->ml_priv;
  10. Lb_stats = Per_cpu_ptr (Pcpu_lstats, smp_processor_id ());
  11. Len = skb->len;
  12. if (Likely (NETIF_RX (SKB) = = net_rx_success)) {//Call NETIF_RX directly for receive processing
  13. Lb_stats->bytes + = Len;
  14. lb_stats->packets++;
  15. } else
  16. lb_stats->drops++;
  17. return NETDEV_TX_OK;
  18. }


    • Note:
1.checksum_partial indicates that the validation of the pseudo-header using the hardware CHECKSUM,L4 layer has been completed and has been added to the Uh->check field, where only the device calculates the checksum value of the entire header 4-layer header.


2. The entire packet sending logic involves three code for mutually exclusive access:

(1) Spinlock_t*root_lock=qdisc_lock (q);

(2) test_and_set_bit (__qdisc_state_running,&q->state)

(3) __netif_tx_lockàspin_lock (&txq->_xmit_lock)

where (1) (3) corresponds to a spinlock, (2) corresponds to a queue status. When you understand how to use these three synchronization methods in your code, first look at the relationship of the relevant data structures, as follows.

The green part of the figure represents (1) (3) two spinlock. First, look at (1) the corresponding code:

 
   
   
  1. Static inline spinlock_t *qdisc_lock (struct Qdisc *qdisc)
  2. {
  3. Return &qdisc->q.lock;
  4. }

So Root_lock is the lock used to control SKB queue access in Qdisc, and locks are required when enqueue, dequeue, requeue are required for the SKB queue.

The __qdisc_state_running flag is used to ensure that a flow control object (QDISC) is not accessed by multiple CPUs at the same time.

The Spinlock at (3), or _xmit_lock in Structnetdev_queue, is used to guarantee mutually exclusive access to the registry function of Dev, that is, Deriver synchronization.

In addition, the kernel code notes write that (1) and (3) are mutually exclusive, to obtain a lock at (1) must first be guaranteed to release (3) at the lock, and vice versa, why do not want to understand .... Who the great God knows to look

3. There is already a dev_queue_xmit function, why do I need a soft interrupt to send it?

We can see that in Dev_queue_xmit the SKB has been processed (such as merging into a package, calculating checksums, etc.), and the processed SKB can be sent directly, when Dev_queue_ XMit will also first SKB the queue (SKB is generally in this function in the queue), and call Qdisc_run attempt to send, but it is possible to send a failure, this will be SKB re-queue, dispatch soft interrupt, and their own direct return.

A soft interrupt simply sends the SKB in the queue and releases the SKB that has been sent, eliminating the need to linearized or checksum the SKB. In addition, if the queue is stopped, Dev_queue_xmit can still join the package to the queue, but cannot send it, so that when the queue is awakened, a soft interrupt is required to send the backlog of packets during the stop. In short, Dev_queue_xmit is the SKB to do some final processing and the first attempt to send, soft interrupt is to send the former failed or not sent out packets sent out. (In fact, sending a soft interrupt also has a role, is to release the packets that have been sent, because in some cases the transmission is done in a hardware interrupt, in order to improve the efficiency of hardware interrupt processing, the kernel provides a way to release SKB into a soft interrupt, just call DEV_KFREE_SKB_IRQ, It will skb into the Softnet_data completion_queue and then turn on the send soft interrupt, Net_tx_action will release Completion_queue in SKB in a soft interrupt)

http://www.bkjia.com/PHPjc/1103186.html www.bkjia.com true http://www.bkjia.com/PHPjc/1103186.html techarticle two layer (link layer) packet sending process Analysis two layer (link layer) packet sending process Analysis--lvyilong316 Description: This series of blog covers the kernel version is 2.6.32 when the upper-level quasi ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.