International - English

Cart Console

Topic Center

Contact Sales

Home > Tutorials > PHP Tutorials

Two layer (link layer) packet sending process Analysis _php tutorial

Last Update:2016-07-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Analysis of data packet sending process in two-layer (link layer)

--lvyilong316

Description: The kernel version covered in this series of posts is 2.6.32
When the upper layer is ready for a package, the link layer packet is sent to the link layer, which is handled primarily by the Dev_queue_xmit function. Packets can be sent in two types, one is the normal transmission process, that is, through the network card driver, the other is through a soft interrupt (see note 3). For ease of understanding, first look at the overall call graph of the DEV_QUEUE_XMI function.

Ldev_queue_xmit

This function is used to add the sent SKB to a dev queue, which must be set to the SKB device and priority before calling the function, which can be called in the context of the interrupt.

return value:

Returning a non-0 (positive or negative) indicates a function error, and returning 0 indicates success, but does not indicate that the packet was successfully sent because the packet could be discarded for reasons such as speed limit.

The incoming SKB will be freed after the function executes, so if you want to control the packet, you need to increase the SKB reference count when you implement the retransmission of the SKB.

When this function is called, the interrupt must be open, because bhenable must require irqenable, otherwise it will cause a deadlock.

 
 
  
  
  int dev_queue_xmit (struct sk_buff *skb)

  
  
  {

  
  
  struct Net_device *dev = skb->dev;

  
  
  struct Netdev_queue *txq;

  
  
  struct Qdisc *q;

  
  
  int rc =-enomem;

  
  
  /* GSO would handle the following emulations directly. */

  
  
  if (Netif_needs_gso (Dev, SKB))

  
  
  Goto GSO;

  
  
  if (Skb_has_frags (SKB) &&

  
  
  ! (Dev->features & Netif_f_fraglist) &&

  
  
  __skb_linearize (SKB))

  
  
  Goto OUT_KFREE_SKB;

  
  
  If SKB has shards but the sending device does not support sharding, or shards have shards in high-end memory but the sending device does not support DMA, all segments need to be re-assembled into a single segment, where __skb_linearize is actually __pskb_pull_tail (SKB, skb- >data_len), this function is basically equivalent to Pskb_may_pull, Pskb_may_pull is to detect whether the SKB corresponding to the main buf there is enough space to pull out len length, If not enough, reassign the SKB and copy the data in the frags into the newly allocated master buff, where the parameter len is set to Skb->datalen, which is to complete the SKB linearization by copying all the data into the main buff.

  
  
  if (Skb_shinfo (SKB)->nr_frags &&

  
  
  (! (Dev->features & Netif_f_sg) | | ILLEGAL_HIGHDMA (Dev, skb) &&

  
  
  __skb_linearize (SKB))

  
  
  Goto OUT_KFREE_SKB;

  
  
  If the packet is not computed checksum and the sending device does not support the checksum of this Protocol, the checksum is computed here (note 1). If the above has been linearized once, here the __skb_linearize will be directly returned, pay attention to the difference between frags and Frag_list, the former is to put more data on a separate allocation of the page, Sk_buff only one. While the latter is connected to multiple Sk_buff

  
  
  if (skb->ip_summed = = checksum_partial) {

  
  
  Skb_set_transport_header (SKB, Skb->csum_start-

  
  
  Skb_headroom (SKB));

  
  
  if (!dev_can_checksum (Dev, skb) && skb_checksum_help (SKB))

  
  
  Goto OUT_KFREE_SKB;

  
  
  }

  
  
  Gso

  
  
  Turn off soft interrupts and disable CPU preemption

  
  
  RCU_READ_LOCK_BH ();

  
  
  Select a Send queue, if the device provides a Select_queue callback function to use it, otherwise by the kernel Select a queue, this is only the implementation of the Linux kernel multi-queue, but to really use the queue, you need to support multi-queue network card, the general network card has only one queue. Set the number of queues when calling Alloc_etherdev assignment Net_device Yes

  
  
  Txq = Dev_pick_tx (dev, SKB);

  
  
  Obtaining the Qdisc of the equipment from the NETDEV_QUEUE structure

  
  
  Q = rcu_dereference (Txq->qdisc);

  
  
  If the device has a queue available, call __DEV_XMIT_SKB

  
  
  if (q->enqueue) {

  
  
  rc = __DEV_XMIT_SKB (SKB, Q, Dev, txq);

  
  
  Goto out;

  
  
  }

  
  
  The following processing is in the absence of a send queue, the soft device generally does not send a queue: such as Lo, tunnle; all we have to do is call the driver's hard_start_xmit and send it out if the send fails, because there is no queue to save it.

  
  
  if (Dev->flags & iff_up) {//Determine if the device is turned on

  
  
  int cpu = SMP_PROCESSOR_ID (); /* OK because BHs is off */

  
  
  if (Txq->xmit_lock_owner! = CPU) {//is on the same CPU

  
  
  Hard_tx_lock (Dev, txq, CPU);

  
  
  if (!netif_tx_queue_stopped (TXQ)) {//OK queue is running state

  
  
  rc = net_xmit_success;

  
  
  if (!dev_hard_start_xmit (SKB, Dev, txq)) {

  
  
  Hard_tx_unlock (Dev, txq);

  
  
  Goto out;

  
  
  }

  
  
  }

  
  
  Hard_tx_unlock (Dev, txq);

  
  
  if (Net_ratelimit ())

  
  
  PRINTK (kern_crit "Virtual device%s asks to"

  
  
  "Queue packet!\n", dev->name);

  
  
  } else {//Txq->xmit_lock_owner = = CPU condition, indicating recursion occurred

  
  
  if (Net_ratelimit ())

  
  
  PRINTK (Kern_crit "Dead loop on virtual device"

  
  
  "%s, Fix It urgently!\n", dev->name);

  
  
  }

  
  
  }

  
  
  rc =-enetdown;

  
  
  RCU_READ_UNLOCK_BH ();

  
  
  OUT_KFREE_SKB:

  
  
  KFREE_SKB (SKB);

  
  
  return RC;

  
  
  Out

  
  
  RCU_READ_UNLOCK_BH ();

  
  
  return RC;

  
  
  }

L__dev_xmit_skb

The __DEV_XMIT_SKB function mainly does two things:

(1) If the flow control object is empty, an attempt is made to send the packet directly.

(2) If the flow control object is not empty, add the packet to the flow control object and run the flow control object.

 
 
  
  
  static inline int __dev_xmit_skb (struct sk_buff *skb, struct Qdisc *q,

  
  
  struct Net_device *dev,

  
  
  struct Netdev_queue *txq)

  
  
  {

  
  
  spinlock_t *root_lock = Qdisc_lock (q);//See note 2

  
  
  int RC;

  
  
  Spin_lock (Root_lock); Lock Qdisc

  
  
  if (Unlikely (Test_bit (__qdisc_state_deactivated, &q->state))) {//Determine if the queue is invalid

  
  
  KFREE_SKB (SKB);

  
  
  rc = Net_xmit_drop;

  
  
  } else if ((Q->flags & Tcq_f_can_bypass) &&!qdisc_qlen (q) &&

  
  
  !test_and_set_bit (__qdisc_state_running, &q->state)) {

  
  
  /*

  
  
  * This is a work-conserving queue; There is no old skbs

  
  
  * Waiting to being sent out; And the Qdisc is not running-

  
  
  * XMit the SKB directly.

  
  
  */

  
  
  __qdisc_update_bstats (q, Skb->len);

  
  
  if (Sch_direct_xmit (SKB, Q, Dev, txq, root_lock))

  
  
  __qdisc_run (q);

  
  
  Else

  
  
  Clear_bit (__qdisc_state_running, &q->state);

  
  
  rc = net_xmit_success;

  
  
  } else {

  
  
  rc = Qdisc_enqueue_root (SKB, q);

  
  
  Qdisc_run (q);

  
  
  }

  
  
  Spin_unlock (Root_lock);

  
  
  return RC;

  
  
  }

Lqdisc_run

There are two timing calls to Qdisc_run ():

1. __DEV_XMIT_SKB ()

2. Soft Interrupt Service Thread NET_TX_SOFTIRQ

 
  
  
  
   
   static inline void Qdisc_run (struct qdisc *q)

  
   
   {

  
   
   if (!test_and_set_bit (__qdisc_state_running, &q->state))//Set queue to run state

  
   
   __qdisc_run (q);

  
   
   }

L__qdisc_run

 
  
  
  
   
   void __qdisc_run (struct qdisc *q)

  
   
   {

  
   
   unsigned long start_time = jiffies;

  
   
   while (Qdisc_restart (q)) {//return value greater than 0 indicates that the flow control object is not empty

  
   
   /* If the queue is found to be running too long, the queue will be stopped and the queue will be added to the Output_queue chain header

  
   
   * Postpone processing if (deferred processing)

  
   
   * 1. Another process needs the CPU;

  
   
   * 2. We ' ve been doing it for too long.

  
   
   */

  
   
   if (need_resched () | | jiffies! = start_time) {//has not been allowed to continue running this flow control object

  
   
   __netif_schedule (q); Add this qdisc to the Output_queue linked list for each CPU variable Softnet_data

  
   
   Break

  
   
   }

  
   
   }

  
   
   Clear the run identity of the queue

  
   
   Clear_bit (__qdisc_state_running, &q->state);

  
   
   }

Loop call Qdisc_restart Send the data, the following function Qdisc_restart is the function that actually sends the packet, it takes a frame off the queue, and then tries to send it out, if the sending fails, it is generally re-queued.

The return value of this function is: Return the remaining queue length on successful send, return 0 if send fails (if send is successful and the remaining queue length is 0 also returns 0)

Lqdisc_restart

The __qdisc_state_running State guarantees that only one CPU at a time is processing this qdisc,qdisc_lock (q) To ensure sequential access to this queue.

Usually netif_tx_lock is used to ensure the sequential (exclusive) access of the device driver, and Qdisc_lock (q) is used to guarantee the sequential access of the QDISC, two mutually exclusive, and one must release the other.

 
  
  
  
   
   static inline int Qdisc_restart (struct qdisc *q)

  
   
   {

  
   
   struct Netdev_queue *txq;

  
   
   struct Net_device *dev;

  
   
   spinlock_t *root_lock;

  
   
   struct Sk_buff *skb;

  
   
   /* Dequeue Packet */

  
   
   SKB = DEQUEUE_SKB (q); Call the Dequeue function from the beginning

  
   
   if (unlikely (!SKB))

  
   
   return 0; Returns 0 indicates that the queue is empty or is restricted

  
   
   Root_lock = Qdisc_lock (q);

  
   
   dev = Qdisc_dev (q);

  
   
   Txq = Netdev_get_tx_queue (Dev, skb_get_queue_mapping (SKB));

  
   
   Return Sch_direct_xmit (SKB, Q, Dev, txq, root_lock); Used to send packets

  
   
   }

Lsch_direct_xmit

Send a SKB, set the queue to the __qdisc_state_running state, ensure that only one CPU runs this function, return 0 indicates that the queue is empty or send is restricted, greater than 0 indicates that the queue is non-empty.

 
 
  
  
  int sch_direct_xmit (struct sk_buff *skb, struct Qdisc *q,

  
  
  struct Net_device *dev, struct netdev_queue *txq,

  
  
  spinlock_t *root_lock)

  
  
  {

  
  
  int ret = netdev_tx_busy;

  
  
  Spin_unlock (Root_lock);//Release Qdisc, because the device lock is to be acquired later

  
  
  Call __netif_tx_lockàspin_lock (&txq->_xmit_lock, which guarantees device-driven exclusive access

  
  
  Hard_tx_lock (Dev, Txq, smp_processor_id ());

  
  
  if (!netif_tx_queue_stopped (TXQ) &&//device is not stopped and the send queue is not frozen

  
  
  !netif_tx_queue_frozen (TXQ))

  
  
  ret = Dev_hard_start_xmit (SKB, Dev, txq); Sending a packet

  
  
  Hard_tx_unlock (Dev, txq); Call __netif_tx_unlock

  
  
  Spin_lock (Root_lock);

  
  
  Switch (ret) {

  
  
  Case NETDEV_TX_OK://If the device successfully sends the data packets

  
  
  ret = Qdisc_qlen (q); Returns the remaining queue length

  
  
  Break

  
  
  Case netdev_tx_locked://Get device lock failed

  
  
  ret = Handle_dev_cpu_collision (SKB, TXQ, q);

  
  
  Break

  
  
  Default://Device busy, re-queued to send (using SOFTIRQ)

  
  
  if (unlikely (ret! = netdev_tx_busy && net_ratelimit ()))

  
  
  PRINTK (kern_warning "BUG%s code%d Qlen%d\n",

  
  
  Dev->name, ret, q->q.qlen);

  
  
  ret = DEV_REQUEUE_SKB (SKB, q);

  
  
  Break

  
  
  }

  
  
  if (Ret && netif_tx_queue_stopped (TXQ) | |

  
  
  Netif_tx_queue_frozen (TXQ)))

  
  
  ret = 0;

  
  
  return ret;

  
  
  }

Ldev_hard_start_xmit

 
 
  
  
  int dev_hard_start_xmit (struct sk_buff *skb, struct Net_device *dev,

  
  
  struct Netdev_queue *txq)

  
  
  {

  
  
  const struct Net_device_ops *ops = dev->netdev_ops;

  
  
  int RC;

  
  
  if (likely (!skb->next)) {

  
  
  As can be seen from here, for each sent package will also be sent to Ptype_all a copy, and packet socket created for proto for Eth_p_all will register a member in Ptype_all, so the protocol number is ETH_P_ All of the packet sockets, the data sent and received can be

  
  
  if (!list_empty (&ptype_all))

  
  
  Dev_queue_xmit_nit (SKB, Dev);

  
  
  if (Netif_needs_gso (Dev, skb)) {

  
  
  if (Unlikely (Dev_gso_segment (SKB)))

  
  
  Goto OUT_KFREE_SKB;

  
  
  if (Skb->next)

  
  
  Goto GSO;

  
  
  }

  
  
  If the sending device does not require SKB->DST, it is released here

  
  
  if (Dev->priv_flags & Iff_xmit_dst_release)

  
  
  Skb_dst_drop (SKB);

  
  
  Call the device registration send function, i.e. dev->netdev_ops-> ndo_start_xmit (SKB, Dev)

  
  
  rc = Ops->ndo_start_xmit (SKB, Dev);

  
  
  if (rc = = NETDEV_TX_OK)

  
  
  Txq_trans_update (TXQ);

  
  
  return RC;

  
  
  }

  
  
  Gso

  
  
  ......

  
  
  }

Ldev_queue_xmit_nit

 
 
  
  
  static void Dev_queue_xmit_nit (struct sk_buff *skb, struct net_device *dev)

  
  
  {

  
  
  struct Packet_type *ptype;

  
  
  #ifdef config_net_cls_act

  
  
  if (! ( Skb->tstamp.tv64 && (G_tc_from (skb->tc_verd) & at_ingress)))

  
  
  Net_timestamp (SKB); Record the timestamp of the packet input

  
  
  #else

  
  
  Net_timestamp (SKB);

  
  
  #endif

  
  
  Rcu_read_lock ();

  
  
  List_for_each_entry_rcu (PType, &ptype_all, list) {

  
  
  /* Never send packets back to the socket they originated from */

  
  
  Traverse the Ptype_all list to find all the original socket interfaces that match the input criteria and loop the packets into the set of interfaces that meet the criteria

  
  
  if ((Ptype->dev = = Dev | |!ptype->dev) &&

  
  
  (Ptype->af_packet_priv = = NULL | |

  
  
  (struct sock *) Ptype->af_packet_priv! = Skb->sk)) {

  
  
  Since the packet is an extra input to the original socket, it is necessary to clone a packet

  
  
  struct Sk_buff *skb2 = Skb_clone (SKB, gfp_atomic);

  
  
  if (!SKB2)

  
  
  Break

  
  
  /* SKB->NH should be correctly (ensure correct head offset)

  
  
  Set by sender, so, the second statement is

  
  
  Just protection against buggy protocols.

  
  
  */

  
  
  Skb_reset_mac_header (SKB2);

  
  
  if (Skb_network_header (SKB2) < Skb2->data | |

  
  
  Skb2->network_header > Skb2->tail) {

  
  
  if (Net_ratelimit ())//net_ratelimit is used to guarantee the frequency of PRINTK in the network code

  
  
  PRINTK (kern_crit "protocol%04x is"

  
  
  "Buggy, Dev%s\n",

  
  
  Skb2->protocol, Dev->name);

  
  
  Skb_reset_network_header (SKB2); Reset L3 Head Offset

  
  
  }

  
  
  Skb2->transport_header = skb2->network_header;

  
  
  Skb2->pkt_type = packet_outgoing;

  
  
  Ptype->func (Skb2, Skb->dev, PType, Skb->dev);//Invoke Protocol (Ptype_all) accept function

  
  
  }

  
  
  }

  
  
  Rcu_read_unlock ();

  
  
  }

Ring-back equipment

For loopback device loopback, the ops->ndo_start_xmit of the device is initialized to the Loopback_xmit function.

 
  
  
  
   
   static const struct Net_device_ops Loopback_ops = {

  
   
   . Ndo_init = Loopback_dev_init,

  
   
   . ndo_start_xmit= Loopback_xmit,

  
   
   . Ndo_get_stats = Loopback_get_stats,

  
   
   };

Drivers/net/loopback.c

 
  
  
  
   
   Static netdev_tx_t loopback_xmit (struct Sk_buff *skb,

  
   
   struct Net_device *dev)

  
   
   {

  
   
   struct Pcpu_lstats *pcpu_lstats, *lb_stats;

  
   
   int Len;

  
   
   Skb_orphan (SKB);

  
   
   Skb->protocol = Eth_type_trans (SKB, Dev);

  
   
   /* It's OK to use Per_cpu_ptr () because BHs is off */

  
   
   Pcpu_lstats = dev->ml_priv;

  
   
   Lb_stats = Per_cpu_ptr (Pcpu_lstats, smp_processor_id ());

  
   
   Len = skb->len;

  
   
   if (Likely (NETIF_RX (SKB) = = net_rx_success)) {//Call NETIF_RX directly for receive processing

  
   
   Lb_stats->bytes + = Len;

  
   
   lb_stats->packets++;

  
   
   } else

  
   
   lb_stats->drops++;

  
   
   return NETDEV_TX_OK;

  
   
   }

Note:

1.checksum_partial indicates that the validation of the pseudo-header using the hardware CHECKSUM,L4 layer has been completed and has been added to the Uh->check field, where only the device calculates the checksum value of the entire header 4-layer header.

2. The entire packet sending logic involves three code for mutually exclusive access:
(1) Spinlock_t*root_lock=qdisc_lock (q);
(2) test_and_set_bit (__qdisc_state_running,&q->state)
(3) __netif_tx_lockàspin_lock (&txq->_xmit_lock)
where (1) (3) corresponds to a spinlock, (2) corresponds to a queue status. When you understand how to use these three synchronization methods in your code, first look at the relationship of the relevant data structures, as follows.
The green part of the figure represents (1) (3) two spinlock. First, look at (1) the corresponding code:

Static inline spinlock_t *qdisc_lock (struct Qdisc *qdisc)

{

Return &qdisc->q.lock;

}

So Root_lock is the lock used to control SKB queue access in Qdisc, and locks are required when enqueue, dequeue, requeue are required for the SKB queue.
The __qdisc_state_running flag is used to ensure that a flow control object (QDISC) is not accessed by multiple CPUs at the same time.
The Spinlock at (3), or _xmit_lock in Structnetdev_queue, is used to guarantee mutually exclusive access to the registry function of Dev, that is, Deriver synchronization.
In addition, the kernel code notes write that (1) and (3) are mutually exclusive, to obtain a lock at (1) must first be guaranteed to release (3) at the lock, and vice versa, why do not want to understand .... Who the great God knows to look
3. There is already a dev_queue_xmit function, why do I need a soft interrupt to send it?
We can see that in Dev_queue_xmit the SKB has been processed (such as merging into a package, calculating checksums, etc.), and the processed SKB can be sent directly, when Dev_queue_ XMit will also first SKB the queue (SKB is generally in this function in the queue), and call Qdisc_run attempt to send, but it is possible to send a failure, this will be SKB re-queue, dispatch soft interrupt, and their own direct return.
A soft interrupt simply sends the SKB in the queue and releases the SKB that has been sent, eliminating the need to linearized or checksum the SKB. In addition, if the queue is stopped, Dev_queue_xmit can still join the package to the queue, but cannot send it, so that when the queue is awakened, a soft interrupt is required to send the backlog of packets during the stop. In short, Dev_queue_xmit is the SKB to do some final processing and the first attempt to send, soft interrupt is to send the former failed or not sent out packets sent out. (In fact, sending a soft interrupt also has a role, is to release the packets that have been sent, because in some cases the transmission is done in a hardware interrupt, in order to improve the efficiency of hardware interrupt processing, the kernel provides a way to release SKB into a soft interrupt, just call DEV_KFREE_SKB_IRQ, It will skb into the Softnet_data completion_queue and then turn on the send soft interrupt, Net_tx_action will release Completion_queue in SKB in a soft interrupt)
http://www.bkjia.com/PHPjc/1103186.html www.bkjia.com true http://www.bkjia.com/PHPjc/1103186.html techarticle two layer (link layer) packet sending process Analysis two layer (link layer) packet sending process Analysis--lvyilong316 Description: This series of blog covers the kernel version is 2.6.32 when the upper-level quasi ...


This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Two layer (link layer) packet sending process Analysis _php tutorial

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support