Realization of Linux network protocol stack by PPPoE

Source: Internet
Author: User

Http://www.cnblogs.com/zmkeil/archive/2013/05/01/3053545.html

This title is more tangled, the previously known PPPoE as the underlying carrier of the PPP protocol, and in fact it is a complete protocol, but its implementation is relatively simple, it is easy to understand how the Linux network stack implementation.

1. General statement

The network programming of user space in Linux is to create a sockfd = socket (family,type,protocol) with socket, and then take the SOCKFD as parameter, make various system calls to realize network communication function. Where family indicates which protocol domain (such as inet, UNIX, and so on) is used, protocol indicates which protocol in the protocol domain (such as TCP, UDP, etc. in inet), type indicates the types of the interface (such as stream, Dgram, etc.) and is generally set to protocol =0, then the default protocol for that type in the family (such as the stream in inet is the default TCP protocol) is used.

In Linux, the module mechanism is used to realize the protocol system, and it has good extensibility, and its basic modules are as follows:

First look to the right, the top-level socket module provides a sock_register () function for each protocol domain module to be used, adding an entry in the global net_family[] array; Each protocol domain module also provides a similar register_xx_ The Proto () function, which is used by each specific protocol, adds an entry in the private xx_proto[] array of the protocol domain. The two arrays hold pointers, pointing to the data structure as shown:

It is obvious that they are used to create different types of socket interface, and is a hierarchical creation process, it can be imagined that the top layer socket_create () to complete some common operations, such as allocating memory, and then call the next layer create; the Create () in the Protocol domain Complete some initialization work that is common within the protocol domain; Create () in the last specific protocol completes the protocol-specific initialization. Specifically, the next section says.

Again to the right, is also the top-level socket module provides 4 functions, the first two are generally called by the specific protocol module, because the protocol stack and the application layer of interaction, the specific will be described later. The latter two generally have protocol domain module invocation, which is used for the interaction between the underlying device and the protocol stack. However, this is not absolute, such as in the PPPoE protocol, these 4 functions are called by the specific protocol module, because the PPPOX protocol domain is not much of the common, the protocol is almost independent of each other. The functions of these 4 functions and the data structure used are described in detail later in the specific use.

2.socket Socket Creation

Let's start by looking at what the final socket socket is made of, which is quite large and gives only the framework:

    1. The basic attributes are state (listen, accept, and so on), the flags flag (blocked, etc.), type types, where family and protocol are not available, because they were created again and have been incorporated into the socket structure.
    2. The file pointer points to a file structure, and in Linux a socket is also abstracted as a document, so it is generally manipulated by standard file operations at the application level.
    3. OPS points to a struct proto_ops structure, which is specific to each protocol, the application layer of the system call, and finally mapped to the network stack in the specific protocol operation method.
    4. SK points to a struct sock structure, which allocates a bit of space as a private part of the protocol, and contains a lot of specific information about the protocol. The first is a struct Sock_common structure, which contains the basic information of the Protocol, and then a sk_prot_create pointer to a struct proto struct, which is described in the first section, with the Proto_regsiter () Registered to the kernel, it contains the application layer to the protocol stack of interactive operations and information (also can be described as appàtransport layer of interactive information); Then there is a SK_BACKLOG_RCV function pointer, the function is called after the protocol stack processing the received packet, Generally only the packet is placed in the socket's receiving queue, waiting for the app to read, the private part of the last Protocol to hold the Protocol's private information, such as PPPoE SessionID, daddr,tcp connection 4 tuples, etc., this information is very important, Use them to differentiate multiple sockets in the same protocol.

Create the overall process, the first section has been discussed, the following with PPPoE as an example, describes a socket socket for the specific creation process:

The key points mentioned here are almost all involved, it is important to note that the struct proto structure here is very simple, because the PPPoE protocol has almost no transport layer, so there is no need to have too many intermediate operations, only one obj_size to indicate the struct The size of the private structure to be allocated after the sock structure, and the contents of the private structure are generally initialized when the connect operation occurs.

After creating a good socket, how does the FOPS,PROTO_OPS,SK_BACKLOG_RCV and so on work, to realize the function of network communication? This is what is to be told later.

3. Active processes

The active process is in the application layer through the system call, triggering the socket to complete some kind of action, some system calls and the standard file operation is similar, so can be directly used SOCKFD fops to describe, such as read, write, IOCTL, and some are unique socket interface, The system call interface needs to be redefined, and Linux uses the Syscall_definen () macro to define the system invocation interface, such as Bind, accept, and so on. These system calls are generally simple, and will eventually call the interface functions in the socket internal proto_ops.

As shown in the socket layer, not all file operations are applicable to the socket, so its unique socket_file_ops only specifies part of the function, and also encapsulates several system calls, which we are familiar with bind, listen, connect, accept. These system call interfaces are static, and they are generally handled simply by invoking the proto_ops operation in the specific socket.

In the protocol stack, the socket-specific proto_ops operation is the main one, but for some complex protocols, such as TCP, additional operations are required to support these interfaces, which are placed in the struct proto in the struct sock. The PPPoE protocol is relatively simple and does not require a struct proto operation to support it, but the obj_size is still important, as described earlier.

As shown in the PPPoE protocol, not all protocol operations are required, such as bind, accept, and so on, select a few below to see the work of the socket's active process in detail.

IOCTL System invocation : The IOCTL is invoked through a standard file operation, as shown in the following:

In the top layer sock_ioctl, for some special cases, such as VLAN, bridge, etc., they are not to operate on the socket itself, but to call the VLAN, Bridge module creation function, which seems a bit out of tune, but for the convenience of operation, and to ensure that network-related operations are encapsulated in the socket, this is a last resort.

In Pppoe_ioctl, the corresponding operation according to CMD, one of which is noteworthy is the Pppiocgchan option, which makes the Pppoe_socket a special channel, which is mainly PPPoE in order to provide services to the PPP protocol and special, With the network protocol stack is not a small relationship, will be specific to see later.

Read system call : Read is also a standard file operation, but note that in the network stack, read is not the receive process, but only from the sock receive queue to remove SKB, submitted to the application layer, as shown in. And how these SKB are obtained is a complex passive process, which is said below.

Connect system call : Connect is a socket.c encapsulated system special, its code is very simple, finally call the Pppoe_connect interface in the protocol stack, the interface function is a very important operation of the PPPoE protocol, As shown in the following examples:

First of all, the address of the problem, this is the network programming a basic problem, because the various protocols used in the address structure, in the application layer, in order to facilitate readability, you can use the protocol-specific address structure, as long as the standard mode of the symbol (that the first element is family), It is then cast to the sockaddr* type and passed to the generic system invocation interface. When the interface function is finally called in the Protocol module, it is converted back.

Looking at Pppoe_connect, the PN pointers are first obtained by the sock structure pointers, which are allocated together (as previously described), which is easy to obtain, and also a pointer to the PPPOE_NET structure (which is globally common in the Protocol). Then the user passes in the addr data into the socket, and executes a set_item function, the function is mainly based on addr information, the socket pointer is placed in the global pppoe_net structure of the protocol (this step is important to the reception process, will be detailed later). Finally, the unique Chan structure in the socket is initialized and called Ppp_register_net_channel (), which is mainly for the PPP service.

4. Delivery process

This is also an active process, in the protocol system, it is a relatively important process, so it is listed separately. In the socket frame, the sending process is done through the standard file operation write, the socket's write operation is Sock_aio_write (), and the Proto_ops->sendmsg () function is eventually called. That is, the pppoe_sendmsg () in the PPPoE module, as shown in:

First get the relevant information from sock, the most important of course is the dev device, because PPPoE device is selected (provided by USERADDR), and some protocols such as IP, will be based on the protocol address, the protocol stack automatically select Dev. Then assign SKB, and prepare the package, which is the key to each protocol, because the PPPoE protocol is very simple, only need to set up a PPPoE header. Finally, call Dev_queue_xmit (SKB) directly and send the package through the device.

5. Summary of network protocol stack structure

What you want to talk about here is, what layer of PPPoE is the Protocol, link layer. By the above description, it should be more accurate to say that PPPoE is a complete protocol, from the application layer to the Protocol module between the devices, in this sense, it and the INET domain protocol is equivalent. As shown in the following:

The protocol here is from the application layer down to the complete process of the physical device, some protocols have a certain similarity, (such as TCP, UDP, also including bare IP based on the IP protocol), they are classified as a protocol domain. As for the protocol layering, it is conceptually, such as the main function of the PPPoE protocol is embodied in the link layer, it is generally referred to as the link layer protocol, and the narrow sense of TCP, UDP is the Transport Layer protocol (and the above-mentioned generalized TCP, UDP is the transport layer, IP-based network layer, link layer of the complete protocol).

A bit forgive, but it doesn't matter, as long as the function of understanding the protocol stack is to get the data from the socket interface, encapsulated into a certain package structure, eventually sent out by the physical device (the reception process in turn). As for the specific implementation, it is decided by the characteristics of the specific agreement, for some complex protocols, layering is a better choice.

Some of these protocols will be more special, such as the previous VLAN, it has never even entered the protocol stack, only in the device driver layer, is converted to the Ethernet protocol, the protocol stack does not need to prepare for it processing interface. In comparison with the typical ICMP protocol, it can be either a complete protocol, an application layer invocation (such as a typical ping program), or it can be used only as a satellite protocol for TCP (only TCP processing, not visible to the application layer). PPPoE here is similar to this, this article describes how it works as a complete protocol, and it can also be used as the underlying basis of PPP protocol, in the next article will describe its specific implementation methods.

6. Passive process-receiving process

The receiving process is a passive process in the house equipment layer, which is often triggered by interrupts, and its implementation complexity is much higher than the delivery process. In the protocol stack, the implementation is also very asymmetric with the sending process. Because the host has control when it is sent, and when it is received, it is a packet to multiple receive modules (one-to-many), only 1.1 points from the information in the packet analysis, and to find the receiving module.

First, the framework of the receiving process is given, and then the implementation is gradually analyzed. As shown in the following:

Without looking at the orange part, a receiving process is triggered by the interruption of the physical device, the device driver is processed accordingly, the standard data structure Sk_buff (SKB) is obtained in the protocol stack, and the data is given to the corresponding protocol according to a special global data structure Packet_type. , the protocol according to its own design characteristics of SKB data processing, and through the global variable xx_net_id and each protocol private special data structure xx_net, to find the corresponding application layer of the packet socket sockets, and put it in the socket socket in the receiving queue Finally, the application layer reads the data through a read system call at some point (as described in section 3rd).

6.1 Processing of the device drive layer

The receiving process of the device driver layer has been described in the previous chapter, usually triggered by a hardware interrupt, followed by a break mode, or by using the NAPI mode, in short, the fundamental task is: according to the characteristics of the device (prior knowledge, such as Ethernet device driver is aware of the basic structure of the Ethernet frame), Converts the received bare data into the standard structure SKB known by the protocol stack (thus enabling the underlying device to be transparent to the upper layer), and then submits it to the appropriate protocol. It is obvious that there are two questions, what is the SKB and what should be prepared? How do I know who to submit to?

Prepare the SKB structure . First look at the composition of the Sk_buff, as shown in. SKB is just a control structure, the actual data in a data_buf, and by the SKB in some column parameter index, see the right, some of the parameters in the allocation of DATA_BUF, copy data is determined, such as head, end, data, tail, etc. Some of them have to be recognized to obtain, such as Mac_header generally in the device driver, and Network_header, Transport_header to the protocol stack to know, and the processing of each protocol is different, If the PPPoE protocol simply does not need to specify Network_header, the TCP protocol has complex header information. The Skb->data pointer and head length can be used to get the App_Data position, so the application layer can read only the application data.

Other parameters in SKB are also important, such as VLAN_TCI used to indicate the ID of the VLAN, and its usage has been discussed earlier. The dev parameter is to run through the entire process, because multiple information in the large structure is used throughout the network system, it is important to note that this parameter is determined by the device driver, generally the physical device being received, but in Linux, the network device is indicated by the net_device structure, A physical device can have multiple protocol devices, which is evident in VLANs, Bridge, and in the PPP protocol, which is also a key point, which is discussed later. The SK parameter indicates which application-level socket the packet belongs to, which is given by the specific protocol and is described later. The protocol parameter is the focus of this section, which is determined by the bytes in Mac_header.

The device driver only cares about Mac_header, which is the initial part of the packet. As mentioned earlier, this requires some prior knowledge, such as the Ethernet device driver, which has a priori guidance on the Ethernet header consisting of Dmac, SMAC, and two-byte protocols, the following is a RTL8012-driven receive fragment (~/dev/net/ethernet/realtek/ APT.C):

submit the protocol stack . The main thing is that according to the Skb->protocol parameter, of course, another important data structure packet_type.

The final commit process in the device driver is completed with the netif_skb_receive () function, which traverses all the packet_type in the system, finds the same protocol and Dev (what this means), calls the Func function in the ptype, such as Ip_ The Func in packet is the IP_RCV () function, so SKB is in the protocol stack.

All global packet_type in the System form a list and are ptype_all indexed by global variables, and ptype_base[] global arrays are provided, and the same type of packet_type is separated into chains, providing convenience for traversal.

Where these global packet_type structures come from, it depends on the first section of the figure, a dev_add_packet (struct packet_type*) in the 4 functions on the left. The Protocol module, when loaded, invokes the function to register its own unique packet_type structure in the kernel, where (*FUNC) has a protocol defined by itself.

Finally, it is important to note that open the If_ether.h file, you can see the Protocol now defined protocol have _P_IP, _p_arp, _p_8021q, _p_ppp_ses, _p_ppp_dis, and so on, if you look at them according to the traditional layered protocol, it will feel very messy , there are two of network layer, link layer, even the same protocol, but if you use the concept of section 5th, it is easy to understand. Then see what module registration, TCP, UDP are based on the IP protocol, as long as there is inet protocol domain module Registration one can, and ARP although also belongs to the inet domain, but it must own a PACKET_TYPE,PPPOE protocol, although it is only a protocol, but there are two stages, So it has two different packet_tpye. It can be seen that this realization is very flexible, depending on the characteristics of the specific agreement.

6.2 Protocol stack receive processing

Protocol stack processing is determined by the various protocols, such as TCP protocol processing is quite complex, and here PPPoE processing is very simple, but it can avoid the details, more clearly see the outline of the process, as shown in:

You can see that the PPPoE protocol processing is almost no, just set the SKB Network_header, Transport_header, and then use the get_item () function to find the socket it belongs to, directly submit it to the upper layer. As shown in the right, the typical TCP receive process is quite complex, where the TCP and IP connectors also need additional private data structures.

The Commit function SK_BACKLOG_RCV, the Pppoe_rcv_core (SK,SKB) function here, first determines whether the data for the PPP channel, if it is submitted to the PPP protocol. Normally, the SOCK_QUEUE_RCV_SKB (SK,SKB) function is used directly to place it in the receive queue of the socket.

Match Application-layer interface : the protocol stack after the packet is processed, it is necessary to determine which socket socket the package belongs to, and this process has a complete mechanism in the kernel to complete, as shown in the framework:

First the kernel has a global structure net_generic, and one of the most important elements is an array of pointers. Then each protocol module is loaded, the Register_pernet_device (struct pernet_operations*) is called (see the first section of the diagram), the two most critical parameters in the pernet_operations structure, One is size, which instructs the kernel to assign a private data structure to the module (for example, PPPoE is a struct pppoe_net) and the other is xx_net_id, which instructs net_generic.ptr[xx_net_id] to point to the data structure, In this way, each protocol module can easily find the private structure that the kernel assigns to itself according to its own xx_net_id. The last protocol's private module also typically has an array of pointers to index each socket that belongs to it.

Workflow is very clear, the specific way to work depends on two functions,

Without looking at the details, only the prototypes of the two functions can be understood, wherein the PN parameter is obtained using the global structure Net_generic and the private xx_net_id of each protocol described above. The Set_item () function is called at Connect (see section 3rd), which is based on SessionID, Remotemac in Pppox_sock (these two parameters are passed by *useraddr, as described in the next protocol analysis). A hash algorithm is used to get a hashint value, and then Pn->hash_ptr[hashint] points to the socket structure. That in turn, when received by the two parameters (obtained from the protocol header of the packet) to get hashint, it is easy to find the corresponding socket.

Each protocol uses different methods and parameters, but the same idea is based on the parameters of the protocol itself (such as the connection of TCP 4-tuple), when the socket is created, or connected (before receiving data), according to a certain algorithm, the pointer is placed in the private xx_net of the Protocol, This can be found by the protocol parameter of the datagram when it is received.

6.3 Namespace Namespace

The socket indexing method above has a detours place: the xx_net structure of each protocol can be allocated directly by the Protocol module itself, the index is also convenient, do not use the global net_generic. The current kernel method, in fact, is for another purpose, that is, the namespace namespace. That is, virtual multi-User A set of mechanisms, specific also did not look closely, as if the current core of the entire namespace has not been completed.

The problem with the network namespace is that the xx_net private structure of each protocol module is not only one, but is determined by the kernel as a whole, that is, each new user is registered (a bit like a virtual machine mechanism), and a new xx_net structure is allocated. This allows multiple users to use the same socket connection with the same parameters, but points to a different socket.

You can see that in many of the above, there will be a net parameter, that is, for this purpose, the main implementation function in NAMESPACE.C.

7. Summary

The main combination of PPPoE protocol, learning the Linux network stack implementation. Because the PPPoE protocol itself is very simple, the code is small, it is easier to grasp the protocol implementation of the outline. Linux network stack, inherit UNIX, use socket socket as the main line, including the creation, protocol connection, active process, matching mechanism, passive process and so on.

It is important to note that in practical applications, there is very little direct communication with the PPPoE protocol, but rather it is used as the underlying basis of the PPP protocol, which requires some techniques in the implementation of the Protocol to support, as described in the next article.

Realization of Linux network protocol stack by PPPoE

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.