Alan Cox: the prev pointer in a one-way linked list

Source: Internet
Author: User

I have previously published an example of level 2 pointer operation on one-way linked list, showing the flexibility of C pointer. This is another example of pointer operation on linked list, which is a completely different usage.

This example is the linux-1.2.13 network protocol stack, on the linked list traversal & data copy implementation. The source file is/net/inet/dev. c. You can download it from the official website of

Starting from the earliest version of 0.96c, the linux network has been implemented by the TCP/IP protocol family. This is the most widely used network protocol. The entire architecture is described in the classic OSI Layer-7 model, dev. c is a link layer implementation. In terms of function, It is located between the network device driver and the network layer protocol implementation module and serves as the data packet transmission channel between the two, an interface module exists-the interface function netif_rx for the driver layer and the interface function net_bh for the network layer. The former is provided to the Interrupt Routine call of the driver module for encapsulation of link data frames. The latter is used as the Interrupt Routine of the driver.Bottom half (buttom half)Used to parse data frames and send them to the upper layer.

For ease of understanding, I would like to add some background knowledge about the network communication principle and linux drive interrupt mechanism. Starting from the underlying physical layer, when the host and router communicate with each other, coaxial and optical fiber are transmitted on the physical medium) with level signals. Host or routerHardware interface Nic)Send and receive these signals. When the signal is sent to the interface, the built-inModem)Convert a digital signal to a binary code so that it can reside in the hardware cache of the host. Interface Card) the device driver will passHard interruptTo obtain the data in the hardware cache. the driver is a module in the operating system that deals directly with hardware devices. The hardware interrupt is triggered by setting the control register during initialization, it is used to notify the driver of the arrival of new data in the hardware cache. The driver of the linux card isInterrupt Processing Routine (ISR)Copy the hardware cache data to the kernel cache, package it into a data link frame for parsing, and then distribute it to various protocol layers. As ISR context is atomic and interrupt shielding, the entire process is cumbersome. Therefore, processing all in ISR affects the real-time response of other interruptions. Therefore, linux implements a bottom halfSoft InterruptThe processing mechanism divides the entire ISR into two parts. The first half of the context shields all the interruptions and is dedicated to handling urgent and real-time transactions, such as copying hardware caches and packaging and encapsulation, the latter half of the context does not block the interruption, but the Code cannot be reentrant), used to process time-consuming and non-urgent transactions, including the parsing and distribution of data frames. The net_bh mentioned below belongs to the second half.

We are mainly concerned with the logic of distributing link frames to the protocol layer. The following is a piece of code from the net_bh function:

  1. 526 void net_bh(void *tmp) 
  2. 527 { 
  3.        ... 
  4. 577 
  5. 578    /* 
  6. 579    * We got a packet ID.  Now loop over the "known protocols" 
  7. 580    * table (which is actually a linked list, but this will 
  8. 581    * change soon if I get my way- FvK), and forward the packet 
  9. 582    * to anyone who wants it. 
  10. 583    * 
  11. 584    * [FvK didn't get his way but he is right this ought to be 
  12. 585    * hashed so we typically get a single hit. The speed cost 
  13. 586    * here is minimal but no doubt adds up at the 4,000+ pkts/second 
  14. 587    * rate we can hit flat out] 
  15. 588    */ 
  16. 589   pt_prev = NULL; 
  17. 590   for (ptype = ptype_base; ptype != NULL; ptype = ptype->next) 
  18. 591   { 
  19. 592    if ((ptype->type == type || ptype->type == htons(ETH_P_ALL)) && (!ptype->dev || ptype->dev==skb->dev)) 
  20. 593    { 
  21. 594      /* 
  22. 595      * We already have a match queued. Deliver 
  23. 596      * to it and then remember the new match 
  24. 597      */ 
  25. 598      if(pt_prev) 
  26. 599      { 
  27. 600        struct sk_buff *skb2; 
  28. 601        skb2=skb_clone(skb, GFP_ATOMIC); 
  29. 602        /* 
  30. 603        * Kick the protocol handler. This should be fast 
  31. 604        * and efficient code. 
  32. 605        */ 
  33. 606        if(skb2) 
  34. 607          pt_prev->func(skb2, skb->dev, pt_prev); 
  35. 608      } 
  36. 609      /* Remember the current last to do */ 
  37. 610      pt_prev=ptype; 
  38. 611    } 
  39. 612   } /* End of protocol list loop */ 
  40. 613   /* 
  41. 614   * Is there a last item to send to ? 
  42. 615   */ 
  43. 616   if(pt_prev) 
  44. 617     pt_prev->func(skb, skb->dev, pt_prev); 
  45. 618   /* 
  46. 619    *  Has an unknown packet has been received ? 
  47. 620    */ 
  48. 621   else 
  49. 622     kfree_skb(skb, FREE_WRITE); 
  50. 623 
  51.       ... 
  52. 640 } 

Here, we will briefly explain the data structure. skb is the sock data encapsulation in the kernel cache. It will be used in the Protocol Stack from the link layer to the transport layer, but the Encapsulation Format is different, mainlyProtocol header)And vice versa. You only need to understand it as a link data frame. The logic of this Code is to parse the protocol fields in skb and query the corresponding protocol nodes from the protocol type linked list maintained by ptype_base for function pointer func callback, to distribute data frames to corresponding protocol layers such as ARP, IP, 8022, and 8023 ).

Is it strange at first glance? This Code uses a pt_prev pointer to maintain the previous node in the ptype linked list, resulting in additional condition Branch judgment. Do you think there is a lot more "surplus? Looking back at the blog article on the list-level pointer-based one-way linked list, it's just the opposite. If you remove pt_prev, the code can be simplified:

  1. for (ptype = ptype_base; ptype != NULL; ptype = ptype->next) 
  2.   { 
  3.     if ((ptype->type == type || ptype->type == htons(ETH_P_ALL)) && (!ptype->dev || ptype->dev==skb->dev)) 
  4.     { 
  5.         /* 
  6.         * We already have a match queued. Deliver 
  7.         * to it and then remember the new match 
  8.         */ 
  9.         struct sk_buff *skb2; 
  10.         skb2=skb_clone(skb, GFP_ATOMIC); 
  11.         /* 
  12.         * Kick the protocol handler. This should be fast 
  13.         * and efficient code. 
  14.         */ 
  15.         if(skb2) 
  16.             pt_prev->func(skb2, skb->dev, pt_prev); 
  17.     } 
  18. } /* End of protocol list loop */ 
  20. kfree_skb(skb, FREE_WRITE); 

Why? However, we must remember that all the Linux kernel source code released on the Internet has been reviewed and verified by many hacker experts. There must be ample reason for such writing, so don't trust your intuition too much. Let's review the code! Let's see what has been done in this loop? In particular, 592nd ~ 611 rows.

Since the skb copied from the network is unique, and the distributed protocol objects may be multiple, You need to perform a clone operation before callback. Note that this is a deep copy, which is equivalent to a kmalloc operation ). After the distribution, you also need to call kfree_skb to release the original skb data block. Its historical mission is now complete, and there is no need to retain 622nd rows ).Note that both actions have kernel overhead.

But why does pt_prev need to maintain a backward node? This is intentional, and its function is to delay the callback operation of the current matching protocol item. For example, if a matching item is found in the linked list traversal, the current loop only uses pt_prev to record this matching item. In addition, nothing else will be done. When the next matching item is found, the callback operation for the last matching item pt_prev will not be performed until the loop ends. Of course, pt_prev = NULL indicates that no matching is performed, ), so this isLatency Tactics. What are the benefits? It saves a lot of unnecessary operations. Which operations are unnecessary? Here, we think backwards. We can see that clone matches the protocol field and pt_prev! = NULL, while kfree is executed under the pt_prev = NULL. Here we can assume that if there are N protocols in the ptype linked list match, then this code will only execute the N-1 clone, without pt_prev, the system will execute N clones and 1 kfree command. If the ptype linked list has only one protocol that matches with it, the whole loop will not execute the clone command of 601st rows, it will not execute the kfree command to the 622nd rows.

That is to say,When the entire linked list matches at least one item, pt_prev reduces the overhead of clone and kfree at a time than sometimes, both of them only perform one kfree operation. This is the benefit of the latency strategy..

Developers familiar with TCP/IP should knowMTU Maximum Transmission Unit)The MTU values of different protocols are different. For example, the MTU of an Ethernet frame is 1500 bytes, the MTU of A 802.3 frame is 1492 bytes, the MTU of a PPP link is 269 bytes, And the MTU of A superchannel is theoretically 65535 bytes. In a high-speed throughput communication network environment, what does it mean to reduce the overhead of a system in the kernel-level code in the multipart data transmission line?

In fact, we can leave aside all network protocol-related knowledge. This is just an extremely common one-way linked list operation, and the logic is not complicated. But let's look at how top hackers think about coding and compare the code they have written, how many times of data processing is done with a simple for loop without further thinking about the rough and unreasonable aspects? Do you have an inexplicable sense of anxiety in the face of true programmers like "Mind" and "chengfu? Are you sure you know how to use and operate the basic linked list data structure in C language? If the answer is yes, let's shake it. Don't get it wrong. In fact, the above paragraph is just a self-explanatory explanation )~~~

Finally, let's thank the distinguished Alan Cox for his outstanding and meticulous contribution to the Linux community! Alan is the one with a red hat in the middle of the figure)


The protocol stack implementation in the latest Linux-2.6.x version has changed significantly, but/net/core/dev. c's netif_receive_skb function still retains the use of pt_prev. The purpose is the same and is to reduce an optimization operation of system overhead.

Regarding Alan, when he was working at the University of swangxi, he installed an early linux version on the school server for the school to use. He fixed many problems and rewritten many parts of the network system. Later, he became an important member of the Linux kernel development team. Alan Cox is responsible for maintaining Version 2.2 and having its own branch on version 2.4 will be crowned with ac, such as 2.4.13-ac1 ). The branch version is very stable, and many errors are corrected. Many vendors use the branch version. Prior to his master's degree in business administration, he was involved in many issues related to Linux kernel development. He had a high position in the community and was sometimes considered a second leader under Linus.

However, in January 28 this year, Alan announced that he had withdrawn from the Linux project for family reasons. The following is his Google + statement:

"I'm leaving the Linux world and Intel for a bit for family reasons, i'm aware that 'family reasons' is usually management speak for 'I think the boss is an Asshole' but I 'd like to assure everyone that while I frequently think Linus is an asshole (and therefore very good as kernel dictator) I am departing quite genuinely for family reasons and not because I 've fallen out with Linus or Intel or anyone else. far from it I 've had great fun working there."


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.