Linux network stack Analysis in the TIM profiling Series

Source: Internet
Author: User

For more Tim articles, visit: IBM tim series




Level: elementary

M. Tim Jones, consultant engineer, emulex

July 16, 2007

Linux? One of the biggest features of the operating system is its network stack. It was originally originated from the BSD network stack and has a very clean set of interfaces, which are well organized. Its interfaces range from protocol-independent layers (such as general Socket Layer interfaces or device layers) to specific layers of various network protocols. This article will explore the Linux network stack interface from the perspective of layering and introduce some of the main structures.

Protocol Introduction

Although the formal introduction to the network generally refers to the OSI (Open Systems Interconnection) model, this article introduces the basic network stack in Linux into a four-layer Internet model (1 ).


Figure 1. Internet model of the network stack

The bottom of this stack is the link layer.Link LayerIt refers to the device driver that provides access to the physical layer, which can be various media, such as serial links or Ethernet devices. Above the link layer isNetwork LayerIt is responsible for directing packets to the target location. The last layer is calledTransport Layer, Responsible for end-to-end communication (for example, within a host ). Although the network layer is responsible for managing communications between hosts, the transport layer is responsible for managing communications between various terminals within the host. The last layer isApplication LayerIt is usually a semantic layer that can understand the data to be transmitted. For example, Hypertext Transfer Protocol (HTTP) is responsible for transmitting requests and responses to web content between the server and the client.

Actually, each layer of the network stack has some more well-known names. Ethernet can be found on the link layer, which is the most commonly used high-speed media. Earlier link layer protocols include some serial protocols, such as slip (Serial Line Internet Protocol), cslip (compressed slip), and PPP (Point-to-Point Protocol ). The most common network layer protocol is IP (Internet Protocol), but there are still some protocols at the network layer that meet other requirements, such as ICMP (Internet Control Message Protocol) and ARP (Address Resolution Protocol ). On the transport layer, it is TCP (Transmission Control Protocol) and UDP (User datatime protocol ). Finally, the application layer contains many familiar protocols, including the standard web protocol HTTP and the E-mail Protocol SMTP (Simple Mail Transfer Protocol ).




Core Network Architecture

Now I continue to understand the Linux network stack architecture and how to implement this Internet model. Figure 2 provides an advanced view of the Linux network stack. The top is the user space layer, orApplication LayerWhich defines the users of the network stack. At the bottom isPhysical DeviceProvides network connection capabilities (serial ports or high-speed networks such as Ethernet ). In the middle isKernel spaceThe network subsystem is also the focus of this article. The socket buffer (de> sk_buffsde>) that flows through the network stack is used to transmit packet data between the source and sink. You will soon see the structure of De> sk_buffde>.


Figure 2. Linux advanced network stack architecture

First, let's take a quick look at the core elements of the Linux network subsystem, which will be described in more detail in subsequent chapters. The top part (see figure 2) is the system call interface. It provides a method for accessing the kernel network subsystem for user space applications. The following is a protocol-independent layer. It provides a common method to use the underlying transport layer protocol. Next is the actual protocol. in Linux, it includes embedded TCP, UDP, and IP. Then there is another protocol-independent layer that provides a common interface for communication with the driver of each device. The device driver itself is at the bottom.




System Call Interface

The system call interface can be described in two aspects. When a user initiates a network call, the process of entering the kernel through the system call interface should be multiple channels. Finally, call de> sys_socketcallde> in./NET/socket. C to end the process, and then send the call to the specified target. Another description of the system call interface is the use of common file operations as network I/O. For example, a typical read/write operation can be performed on a network socket (the socket uses a file descriptor, which is the same as a common file ). Therefore, although many operations are dedicated to the network (use de> socketde> call to create a socket, use de> connectde> call to connect to a receiver, and so on ), however, there are some standard file operations that can be applied to network objects, just like operating common files. Finally, the system call interface provides a transfer control method between the user space application and the kernel.




Protocol-independent interface

The socket layer is a protocol-independent interface that provides a set of common functions to support different protocols. The socket layer not only supports typical TCP and UDP protocols, but also supports IP, bare Ethernet, and other transmission protocols, such as sctp (stream control transmission protocol ).

Socket operations are required for communication through the network stack. In Linux, the socket structure is de> struct sockde>, which is defined in Linux/include/NET/sock. h. This huge structure contains all the status information required by a specific socket, including the specific protocol used by the socket and some operations that can be performed on the socket.

The network subsystem can understand the available protocols by defining a special structure of its own functions. Each Protocol maintains a structure named de> protode> (which can be found in Linux/include/NET/sock. h ). This structure defines how to execute specific socket operations from the socket layer to the transport layer (for example, how to create a socket, how to use the socket to establish a connection, and how to close a socket ).




Network Protocol

The network protocol section defines available specific network protocols (such as TCP and UDP ). They are all initialized in a function named de> inet_initde> in the Linux/NET/IPv4/af_inet.c file (because both TCP and UDP are part of the De> inetde> cluster protocol ). De> inet_initde> functions use the de> proto_registerde> function to register each embedded protocol. This function is in Linux/NET/CORE/sock. in addition to adding the protocol to the active protocol list, as defined in C, this function can also allocate one or more slab caches if needed.

You can use the de> protode> interface in the UDP. C and raw. c files in the Linux/NET/IPv4/directory to learn how each Protocol identifies itself. Each of these protocol interfaces maps to De> inetsw_arrayde> according to the type and protocol. This array maps the embedded protocol and operation together. The structure and relationship of De> inetsw_arrayde> are shown in figure 3. At first, we will call de> inet_register_protoswde> in de> inet_initde> to initialize each protocol in this array to De> inetswde>. Function de> inet_initde> also initializes various de> inetde> modules, such as ARP, ICMP, and IP modules, and TCP and UDP modules.


Figure 3. Internet Protocol Array Structure

Relationship between socket protocols
Recall that when creating a socket, You need to specify the type and protocol, such as de> my_sock = socket (af_inet, sock_stream, 0) de>. De> af_inetde> indicates an Internet address cluster. It uses a stream socket and is defined as de> sock_streamde> (as shown in de> inetsw_arrayde> ).

Note that in figure 3, the De> protode> structure defines the transmission-specific method, while the de> proto_opsde> structure defines the general socket method. You can call de> inet_register_protoswde> to add other protocols to the De> inetswde> protocol. For example, sctp is added by calling de> sctp_initde> in Linux/NET/sctp/protocol. C. For more information about sctp, see references.

Data movement in the socket is implemented using the core structure of a so-called Socket buffer (de> sk_buffde>. De> sk_buffde> contains packet data and State data that involves multiple layers in the protocol stack. Each sent or received packet is represented by a de> sk_buffde>. The structure of De> sk_buffde> is defined in Linux/include/Linux/skbuff. H, as shown in figure 4.


Figure 4. Socket buffer and its relationship with other structures

, Multiple de> sk_buffde> can be linked together for a given connection. Each de> sk_buffde> identifies the destination or source of the received message in the device structure (de> net_devicede>. Because each packet is represented by a de> sk_buffde>, therefore, the packet header can be easily located through a set of pointers (de> thde>, de> iphde> and de> macde> [used for media access control or MAC header. Because de> sk_buffde> is the center of socket data management, many supported functions are created to manage them. Some functions are used to create and destroy the de> sk_buffde> structure, or clone or queue it for management.

For a given socket, the socket buffer can be linked together, which can contain a large amount of information, including the link to the protocol header and the timestamp (when the message is sent or received ), and the devices related to the message.




Device-independent interface

Under the protocol layer is another unrelated interface layer, which connects the protocol with hardware devices with many different features. This layer provides a set of common functions for underlying network device drivers to operate on high-level protocol stacks.

First, the device driver may register or log out of the kernel by calling de> register_netdevicede> or De> unregister_netdevicede>. The caller first enters the de> net_devicede> structure and then passes the structure for registration. The kernel calls its de> initde> de> sysfsde> entry and adds the new device to the device list (the linked list of active devices in the kernel ). In Linux/include/Linux/netdevice. H, you can find the de> net_devicede> structure. These functions are implemented in Linux/NET/CORE/dev. C. Function (if this function is defined), then execute a set of health checks and create

To send de> sk_buffde> to the device from the protocol layer, you need to use the de> dev_queue_xmitde> function. This function can queue de> sk_buffde>, the underlying Device Driver transmits the final data (use de> net_devicede> referenced in de> sk_buffde> or the network device defined by De> sk_buff-> devde> ). The De> devde> structure contains a method named de> hard_start_xmitde>, which stores the driver functions used to initiate de> sk_buffde> transmission.

The message is usually received using de> netif_rxde>. When the underlying device driver receives a packet (included in the allocated de> sk_buffde>), it uploads de> sk_buffde> to the network layer by calling de> netif_rxde>. Then, this function queues de> sk_buffde> in the upper-layer protocol queue for later processing through de> netif_rx_schedulede>. You can find the de> dev_queue_xmitde> and de> netif_rxde> Functions in Linux/NET/CORE/dev. C.

Recently, a new Application Programming Interface (napi) is introduced in the kernel, which allows the driver to interact with the device-independent layer (de> devde>. Some drivers use napi, but most drivers still use the old-fashioned frame receiving interface (the ratio is about 6 to 1 ). Napi can produce better performance under high load, which avoids interruption for each incoming frame.




Device Driver

The bottom of the network stack is the device driver responsible for managing physical network devices. For example, the slip driver used by the packet serial port and the Ethernet driver used by the Ethernet device are both devices at this layer.

During initialization, the device driver allocates a de> net_devicede> structure and then initializes it using a required program. One of these programs is de> Dev-> hard_start_xmitde>, which defines how the upper layer transmits the de> sk_buffde> queue. The parameter of this program is de> sk_buffde>. The operation of this function depends on the underlying hardware, but the packets described by De> sk_buffde> are usually moved to the hardware ring or queue. As described in the device-independent layer, for napi-compatible network drivers, the De> netif_rxde> and de> netif_receive_skbde> interfaces are used to receive frames. The napi driver limits the underlying hardware capabilities. For more information, see references.

After the device driver configures its own interface in the De> devde> structure, call de> register_netdevicede> to use this configuration. In Linux/Drivers/net, you can find the driver dedicated to the network device.




Outlook

Linux source code is the best way to learn about Device Driver Design for most device types, including network device drivers. Here we can find various design changes and use of available kernel APIs. However, every point we have learned is very useful and can be used as the starting point of the new device driver. Unless you need a new protocol, the rest of the code in the network stack is common and useful. Even now, the implementation of TCP (for stream Protocol) or UDP (for message-based protocol) can be used as a new useful module for development.

References

Learning

  • For more information, see the original article on the developerworks global site.

  • Please refer to "Introduction to the Internet protocols" on www.linuxjunkies.org for a brief introduction to TCP/IP, UDP, and ICMP.

  • "Kernel commands called using Linux systems" (developerworks, March 2007) introduced the Linux system call interface, which is an important layer in the Linux kernel, gnu c library (glbic) provides support for user space, so that function calls can be performed between the user space and the kernel.

  • "Use the/proc file system to access the content of the Linux kernel" (developerworks, May March 2006) introduced the/proc file system, which is a virtual file system, it provides an innovative way for applications in the user space to communicate with the kernel. This article shows the/proc application and the kernel modules that can be loaded.

  • If you are interested in network protocols, Linux is a very good operating system, which is similar to BSD. "Using sctp to optimize the network" (developerworks, February 2006) introduced sctp, one of the most interesting network protocols. It operates in a similar way as TCP, but adds some useful features, for example, message, multi-host, and multi-stream features.

  • "Linux slab distributor details" (developerworks, May 2007) introduced one of the most interesting features of Linux memory management: slab distributor. This mechanism originated from SunOS, but it found a friendly home in the Linux kernel.

  • The napi driver has many advantages over the driver using the old-fashioned Message Processing Framework, from terminal management to message processing. Learn more about napi interfaces and designs on osdl.

  • For more information about Linux user space programming, refer toGNU/Linux Application ProgrammingA book.

  • Written by TimBSD sockets programming from a multi-language perspectiveIn this book, you can learn about socket programming using the BSD Socket API.

  • Learn more about free software on the Free Software Foundation web site. Since Linux is a free software, anyone who wants to work on it can assemble and release their own releases.

  • Find more resources for Linux developers in the developerworks Linux area, including Linux tutorials.

  • Stay tuned to developerworks technical events and network broadcasts.


Obtain products and technologies

  • I ordered Linux SEK, which contains two DVDs, including the latest Linux IBM trial software, including DB2? , Lotus? , Rational? Tivoli? And WebSphere ?.

  • Build your next development project on Linux with IBM trial software that can be downloaded directly from developerworks.


Discussion

  • Join the developerworks community by participating in the developerworks blog.

About the author

M. Tim Jones is an embedded software engineer. He isGNU/Linux Application Programming,AI Application ProgrammingAndBSD sockets programming from a multilanguage perspectiveAnd other books. His engineering background is very extensive, from synchronizing the kernel development of the spacecraft to the embedded architecture design, to the development of network protocols. Tim is a consultant engineer at Emulex Corp. In longmont, Colorado.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.