Deep Analysis of Linux kernel linked list

Last Update:2017-08-16 Source: Internet

Author: User

Tags prefetch

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In-depth analysis of Linux kernel linked list-general Linux technology-Linux programming and kernel information. The following is a detailed description. This article analyzes in detail the implementation of the chain table structure in the 2.6.x kernel, and explains each chain table operation interface in detail through an instance.

Yang Sha Zhou (pubb@163.net)
Computer College, National Defense University of Science and Technology

This article analyzes in detail the implementation of the chain table structure in the 2.6.x kernel, and explains each chain table operation interface in detail through an instance.

I. Data Structure of linked lists
A linked list is a commonly used data structure for organizing ordered data. It connects a series of data nodes into a data link through a pointer. It is an important implementation method for linear tables. Compared with arrays, the linked list is more dynamic. When creating a linked list, you do not need to know the total amount of data in advance. You can randomly allocate space and efficiently insert or delete data anywhere in the linked list. The overhead of the linked list is mainly the access sequence and the Space Loss of the organizational chain.

Generally, the Linked List data structure should contain at least two domains: data domains and pointer domains. data domains are used to store data and pointer domains are used to establish connections with the next node. According to the organization of the pointer field and the connection form between nodes, the linked list can be divided into single-chain tables, double-chain tables, circular linked lists, and Other types. The following lists the common linked list types:

1. Single-chain table

Figure 1 single-chain table

A single-chain table is the simplest type of linked list. It has only one pointer field pointing to the next node. Therefore, the traversal of a single-chain table can only start from the beginning to the end (usually a NULL pointer).

2. double-stranded table

Figure 2 double-stranded table

By designing two pointer fields, the dual-chain table can be traversed in two directions, which is different from the single-chain table. If the dependency between the predecessor and the successor is disrupted, a "binary tree" can be formed "; if the frontend of the first node points to the End Node of the linked list and the successor of the End Node points to the first node (the dotted line in 2), a circular linked list is formed. If more pointer fields are designed, to form a variety of complex tree data structures.

3. Circular linked list
The cyclic linked list feature that the tail node points to the first node. The double-loop linked list has been given in the previous section. It is characterized by any node and any data in the two directions can be found in the linked list. If the precursor pointer is removed, it is a single-loop linked list.

A large amount of linked list structures are used in the Linux kernel to organize data, including the device list and Data Organization in various functional modules. Most of these linked lists are implemented in [include/linux/list. h. The subsequent sections of this article will detail the organization and use of this data structure through examples.

II. Implementation of the data structure of Linux 2.6 kernel linked list
Although the 2.6 kernel is used as the basis for the explanation, the linked list structure in the 2.4 kernel is actually no different from that in the 2.6 kernel. The difference is that 2.6 expands the data structure of two linked lists: the read copy Update (rcu) and the HASH linked list (hlist ). These two extensions are based on the most basic list structure. Therefore, this article mainly introduces the basic linked list structure, and then briefly introduces rcu and hlist.

The definition of the linked list data structure is very simple (Excerpted from [include/linux/list. h]. All the following code, unless described, is taken from this file ):

Struct list_head {struct list_head * next, * prev ;};

The list_head structure contains two pointers to the list_head structure, prev and next. It can be seen that the linked list of the kernel has the double-chain table function. In fact, it is usually organized into a double-cycle linked list.

Unlike the double-chain table structure model introduced in section 1, the list_head has no data field. In the Linux kernel linked list, instead of containing data in the linked list structure, it contains linked list nodes in the data structure.

In data structure textbooks, the classic definition of linked lists is usually as follows (using a single-chain table as an example ):

Struct list_node {struct list_node * next; ElemType data ;};

Because of ElemType, each data item type needs to define its own linked list structure. Experienced C ++ programmers should know that the standard Template library uses C ++ templates to abstract Linked List Operation interfaces irrelevant to data item types using templates.

In the Linux kernel linked list, data that needs to be organized by the linked list usually contains a struct list_head member, for example, in [include/linux/netfilter. h] defines an nf_sockopt_ops structure to describe the getsockopt/setsockopt interface prepared by Netfilter for a protocol family. There is a member (struct list_head list, the nf_sockopt_ops structure of each protocol family is organized in a linked list through this list member. The header is defined in [net/core/netfilter. in c], nf_sockopts (struct list_head ). We can see that this generic linked list structure avoids the trouble of defining your own linked list for each data item type. Linux is simple and practical, and does not seek perfection or standard style, which is fully embodied here.

Figure 3 nf_sockopts linked list

Iii. Linked List Operation Interface

1. Declaration and initialization
In fact, Linux only defines linked list nodes and does not specifically define the linked list header. How can a linked list structure be created? Let's take a look at the macro LIST_HEAD:

# Define LIST_HEAD_INIT (name) {& (name), & (name)} # define LIST_HEAD (name) struct list_head name = LIST_HEAD_INIT (name)

When we declare a chain table header named nf_sockopts with LIST_HEAD (nf_sockopts), its next and prev pointers are initialized to point to ourselves, so that we have an empty chain table, in Linux, the next of the header pointer is used to determine whether the linked list is empty:

Static inline int list_empty (const struct list_head * head) {return head-> next = head ;}

In addition to initializing a linked list with the LIST_HEAD () Macro, Linux also provides an INIT_LIST_HEAD macro for initializing the linked list at runtime:

# Define INIT_LIST_HEAD (ptr) do {(ptr)-> next = (ptr); (ptr)-> prev = (ptr);} while (0)

We use INIT_LIST_HEAD (& nf_sockopts) to use it.

2. Insert/delete/merge
A) insert

Two insert operations are available for the linked list: Insert at the header and insert at the end of the table. Linux provides two interfaces for this purpose:

Static inline void list_add (struct list_head * new, struct list_head * head); static inline void list_add_tail (struct list_head * new, struct list_head * head );

Because the Linux linked list is a cyclic table, and the next and prev of the header point to the first and last nodes in the linked list respectively, The list_add and list_add_tail are not very different. In fact, Linux uses

_ List_add (new, head, head-> next );

And

_ List_add (new, head-> prev, head );

To implement two interfaces. It can be seen that after the header is inserted after the head, and after the end of the table is inserted after the head-> prev.

Suppose there is a new nf_sockopt_ops Structure Variable new_sockopt which needs to be added to the nf_sockopts linked list header. We should do this:

List_add (& new_sockopt.list, & nf_sockopts );

From this we can see that the nf_sockopts linked list records not the address of new_sockopt, but the address of the list element. How can I access new_sockopt through a linked list? The following is a detailed introduction.

B) Delete

Static inline void list_del (struct list_head * entry );

To delete the new_sockopt entry added to the nf_sockopts linked list, perform the following operations:

List_del (& new_sockopt.list );

The excluded new_sockopt.list, prev, And next pointers are set to two special values: LIST_POSITION2 and LIST_POSITION1, respectively, this setting ensures that node items not in the linked list are inaccessible-Access to LIST_POSITION1 and LIST_POSITION2 will cause page faults. Correspondingly, after the list_del_init () function resolves the node from the linked list, it calls LIST_INIT_HEAD () to set the node to the empty chain state.

C) Migration

Linux allows you to move a node that originally belongs to a linked list to another linked list. There are two types of nodes inserted into the new linked list:

Static inline void list_move (struct list_head * list, struct list_head * head); static inline void list_move_tail (struct list_head * list, struct list_head * head );

For example, list_move (& new_sockopt.list, & nf_sockopts) will delete new_sockopt from its linked list and link it to the header of nf_sockopts.

D) Merge

In addition to the insert and delete operations on nodes, the Linux linked list also provides the insert function for the entire linked list:

Static inline void list_splice (struct list_head * list, struct list_head * head );

Assume that there are two linked lists, list1 and list2 (both the struct list_head variables). When list_splice (& list1, & list2) is called, as long as list1 is not empty, the contents of the list1 linked list will be mounted to the list2 linked list, located between list2 and list2.next (the first node of the original list2 table. The new list2 linked list takes the first node of the original list1 table as the first node, while the last node remains unchanged. (The virtual arrow is the next pointer ):

Figure 4 list_splice (& list1, & list2)

After list1 is mounted to list2, the next and prev of list1 as the original header pointer still point to the original node. To avoid confusion, Linux provides a list_splice_init () function:

Static inline void list_splice_init (struct list_head * list, struct list_head * head );

After merging the list to the head linked list, this function calls INIT_LIST_HEAD (list) to set the list to a null chain.

3. Traverse
Traversal is one of the most common operations of a linked list. To facilitate the core application of a traversal table, the Linux linked list abstracts the traversal operation into several macros. Before introducing rmacro, let's take a look at how to access the data items we actually need from the linked list.

A) from linked list nodes to data item Variables

We know that Linux linked lists only store the address of the list_head member variable in the data item structure. How can we access the node data of the list_head member as its owner? Linux provides a list_entry (ptr, type, member) macro for this purpose. ptr is the pointer to the list_head member in the data, that is, the address value stored in the linked list, type is the data item type, and member is the variable name of the list_head member in the data item type definition. For example, if we want to access the first nf_sockopt_ops variable in the nf_sockopts linked list, we will call it as follows:

List_entry (nf_sockopts-> next, struct nf_sockopt_ops, list );

Here, "list" is the name of the node member variable defined in the nf_sockopt_ops structure for linked list operations.

# Define list_entry (ptr, type, member) container_of (ptr, type, member) container_of macro defined in [include/linux/kernel. h]: # define container_of (ptr, type, member) ({const typeof (type *) 0)-> member) * _ mptr = (ptr ); (type *) (char *) _ mptr-offsetof (type, member);}) offsetof macro is defined in [include/linux/stddef. h]: # define offsetof (TYPE, MEMBER) (size_t) & (TYPE *) 0)-> MEMBER)

Size_t is finally defined as unsigned int (i386 ).

Here we use a small technique that uses compiler technology, that is, first obtain the offset of the structure member in the structure, and then obtain the address of the Main Structure Variable Based on the address of the member variable.

Container_of () and offsetof () are not only used for linked list operations. The most interesting thing here is (type *) 0)-> member, it forces the 0 address to "convert" to the type structure pointer, and then accesses the member in the type structure. In the container_of macro, it is used to provide the typeof () parameter (typeof () is an extension of gcc, similar to sizeof (), to obtain the data type of member members; in offsetof () the address of this member is actually the offset of the member from the structure variable in the type data structure.

If this is not easy to understand, take a look at the figure below:

Figure 5 Principle of offsetof () Macro

Given a structure, offsetof (type, member) is a constant. list_entry () uses this constant offset to obtain the variable address of the linked list data item.

B) Bai Lihong

The nf_register_sockopt () function of [net/core/netfilter. c] contains the following:

...... Struct list_head * I ;...... List_for_each (I, & nf_sockopts) {struct nf_sockopt_ops * ops = (struct nf_sockopt_ops *) I ;...... }......

The function first defines a (struct list_head *) pointer variable I, and then calls list_for_each (I, & nf_sockopts) for traversal. In [include/linux/list. h], the list_for_each () macro is defined as follows:

# Define list_for_each (pos, head) for (pos = (head)-> next, prefetch (pos-> next); pos! = (Head); pos = pos-> next, prefetch (pos-> next ))

It is actually a for loop. It uses the input pos as the cyclic variable, starting from the header, moving the pos one by one (next) until it returns to the head (prefetch () this parameter can be left blank and used for prefetch to increase the traversal speed ).

In nf_register_sockopt (), the nf_sockopts linked list is traversed. Why can I directly regard the obtained list_head member variable address as the address of the struct nf_sockopt_ops data item variable? In the struct nf_sockopt_ops structure, list is the first member. Therefore, its address is the address of the structure variable. The usage of getting the data variable address should be:

Struct nf_sockopt_ops * ops = list_entry (I, struct nf_sockopt_ops, list );

In most cases, you need to obtain the node data items of the linked list when using a traversal table, that is, list_for_each () and list_entry () are always used at the same time. In this case, Linux provides a list_for_each_entry () macro:

# Define list_for_each_entry (pos, head, member )......

Unlike list_for_each (), the pos here is the data item structure pointer type, rather than (struct list_head *). The nf_register_sockopt () function can be designed more easily using this macro:

...... Struct nf_sockopt_ops * ops; list_for_each_entry (ops, & nf_sockopts, list ){...... }......

Some applications require reverse traversal. Linux provides list_for_each_prev () and forward () to perform this operation. The usage is the same as the list_for_each () and list_for_each_entry () described above.

If the traversal does not start from the linked list header but from a known node pos, you can use list_for_each_entry_continue (pos, head, member ). This kind of requirement sometimes occurs, that is, after a series of calculations, if the pos has a value, it will be traversed from the pos. If not, it will start from the chain table header. Therefore, linux provides a list_prepare_entry (pos, head, member) Macro that uses the returned value as the pos parameter of list_for_each_entry_continue () to meet this requirement.

4. Security considerations
In a concurrent execution environment, the linked list operation should generally consider synchronization security issues. For convenience, Linux will leave this operation to the application for its own processing. Linux linked list has two main security considerations:

A) list_empty () Judgment

The basic list_empty () checks whether the linked list is empty by pointing to the next of the header pointer. the Linux linked list provides a list_empty_careful () Macro, it judges the next and prev of the header pointer at the same time, and returns true only when both of them point to itself. This is mainly to cope with the next and prev inconsistencies caused by another cpu processing the same linked list. But code comments also admit that this security protection capability is limited: unless other cpu linked list operations only list_del_init (), security is still not guaranteed, that is, lock protection is still required.

B) Delete the traversal Node

We have introduced several macros used for linked list traversal. They all achieve traversal through moving pos pointers. However, if the traversal operation contains the node pointed to by the pos pointer, the movement of the pos pointer will be interrupted, because list_del (pos) set the next and prev values of pos to the special values of LIST_POSITION2 and LIST_POSITION1.

Of course, the caller can cache the next pointer to ensure that the traversal operation can be consistent, but for programming consistency, the Linux linked list still provides two "_ safe" interfaces corresponding to the basic traversal operation: list_for_each_safe (pos, n, head), list_for_each_entry_safe (pos, n, head, member), they require the caller to provide another pointer n of the same type as pos, store the address of the next pos node in the for loop to avoid the chain disconnection caused by the release of the pos node.

Iv. Expansion

1. hlist

Figure 6 list and hlist

Linux linked list designer (because list. h has no signature, so it is likely that Linus Torvalds thinks that double-ended double-chain tables (next, prev) are "too wasteful" for HASH tables ", therefore, we have designed a set of hlist data structures for HASH table applications-a single-finger-header dual-loop linked list. We can see that the hlist header has only one pointer to the first node, there is no pointer to the end node, so that the header stored in a massive HASH table can reduce the space consumption by half.

Because the data structure of the header and the node is different, if the insert operation occurs between the header and the first node, the previous method will not work: the first pointer of the header must be modified to point to the newly inserted node, you cannot use a uniform description like list_add. Therefore, the prev of the hlist node no longer refers to the pointer of the forward node, but to the next (first for the header) in the forward node (may be the header) pointer (struct list_head ** pprev), so that the insert operation on the header can access and modify the next (or first) of the precursor node through the consistent "* (node-> pprev) pointer.

2. read-copy update
There are also a series of macros ending with "_ rcu" in the Linux linked list function interface, which correspond to many functions described above one by one. RCU (Read-Copy Update) is a new technology introduced in the 2.5/2.6 kernel. It improves synchronization performance through delayed write operations.

As we know, the number of data read operations in the system is much higher than the number of write operations, and the rwlock mechanism will rapidly decline as the number of processors increases in the smp environment (see reference 4 ). To address this background, Paul E. McKenney of the IBM Linux technology center proposed the "read copy Update" technology and applied it to the Linux kernel. The core of RCU technology is that write operations are divided into two steps: Write-update. Read operations are allowed at any time. When the system has write operations, the update operation is delayed until all the read operations on the data are completed. The RCU function in the Linux linked list is only a small part of the Linux RCU. The Implementation Analysis of RCU is beyond the scope of this article. Interested readers can refer to the reference materials of this article on their own; the use of the RCU linked list is basically the same as that of the basic linked list.

V. Example
In addition to forward and reverse output files, the program in the attachment has no practical effect and is only used to demonstrate the use of Linux linked list.

For ease of use, the example uses a user-State program template. If you need to run it, you can use the following command to compile it:

Gcc-D__KERNEL _-I/usr/src/linux-2.6.7/include pfile. c-o pfile

Because the kernel linked list is restricted to use in the kernel state, the data structure itself is not only running in the kernel state. Therefore, in my compilation, use the "-dsf-kernel _" Switch "spoofing" compiler.

References
1. Wikipedia http://zh.wikipedia.org, a Network Dictionary released under the GNU Documentation License, an extension of the concept of free software, the concept of "linked list" in this article is to use its version.
2. "Linux kernel Scenario Analysis", Mr. Mao's book on Linux kernel can answer a vast majority of questions about the kernel, including several key data structures of the kernel linked list.
3. Linux kernel 2.6.7 source code, all issues that do not understand, as long as you focus on the code, can always be clear.
4. Kernel Korner: Using RCU in the Linux 2.5 Kernel. RCU's main Developer Paul McKenney published an article about RCU on Linux Journal in October 2003. Can be obtained on the http://www.rdrop.com/users/paulmck/rclock/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More