A common list structure (from Linux source code)

Last Update:2018-07-06 Source: Internet

Author: User

Tags prev

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Here is an example of a two-way circular link list. A generally defined list structure, such as a linked list of integers, uses the following structure:

struct List_int {
   int n;
   struct list_int* next;
   struct list_int* prev;
};

Then there is an obvious drawback, if you change int to another data type, and you want to define a similar list, then you have to define another list_xxx structure, and then rewrite the various list operations again.

In fact, there is a workaround for this problem, which is described in the book "Linux Kernel Design and implementation". However, since I wrote the article myself, I will not quote the book of things, I personally say it.

Define the following structure:

struct LIST_T {
   struct list_t* next;
   struct list_t* prev;
};

Notice that there is no data of type int in this structure. So what should be defined when defining an int linked list? The answer is structured as follows.

struct List_int {
   struct list_t list;
   int n;
};

Thus, in a linked list, each list_int structure is connected. Inserting, deleting, and so on are the same as normal lists, which are easy to do.

For example, a list_int structure data is known, and its address is X. I want to access the next item of it, I need to use a list like x. Next to this code. At first glance, it seems like you can only access the list address of the next item, but I want the address of the whole next item. In this situation, in fact, very good. But this needs to be said from the memory layout.

The layout of the data structure in memory is assumed to be a 32-bit computer, where the memory address p starts, and a list_int type of data is stored. Starting at p, the first is a complete list_t structure that contains two addresses: Next and prev, each accounting for 4 bytes (because it is a 32-bit system, 32 bits = 4 bytes), a total of 8 bytes, or a list_t structure of 8 bytes. Then it is the int data, which accounts for 4 bytes. So the entire list_int structure takes up 12 bytes, the first 4 bytes is the next pointer, then 4 bytes is the prev pointer, and the last 4 bytes is the int data n.

List X. What does next get from this statement is the address of the list_t structure of the next item of X, which is the structure of next and Prev, which is located at the very front of the entire list_int structure, so X-list. Next you've got the address of the next item in X! This address is the address of the entire LIST_INT structure, the address of the list_t structure, and the address of the next pointer. In the C language, the list_t pointer can be converted into a list_int pointer with a forced type conversion, although it may seem like one more step type conversion, but in the final assembly language code, this step does not cost, after all, the value has not changed.

In the previous example, it was so convenient to get the address of the next item directly from the address of a member in the next item, essentially because the member was at the beginning, causing the member's address to be equal to the address of the entire structure. So is it possible to have this member at the beginning of the entire structure at any time? That's not necessarily true. Even sometimes, a structure can be found in two linked lists, such as the following structure:

struct List_int {
   struct list_t list1;
   struct list_t list2;
   int n;
};

Then these two list structures cannot be at the same time at the beginning, where the list2 is not the beginning. So I'm looking for the next item in the number 2nd list, X--List2. Next will not be able to get the next address directly. Then it's time to look at the memory layout again.

A list_int now accounts for 20 bytes, followed by 4 bytes of List1 in the past. Next, 4 bytes of List1. Prev, 4 bytes of List2. Next, 4 bytes of List2. Prev, 4 bytes of n. If the address of the entire List_int structure is p, the addresses of these members are p, p+4, P+8, p+12, p+16, and the List1 address is p,list2.

X-List2. What does next get, in fact, is this "p+8"? If you want to list_int, subtract 8 from it. Then a non-overhead coercion type conversion is possible.

But one thing to note in C is that, for example, I got a list2 address for an item, and I want to get the starting address of this item, I write (struct list_int*) (q-8), that's a big mistake, because Q is a list_t type of address, and a list_ The T address accounts for 8 bytes, you write a q-8, actually it gives you how much to lose, to you minus 64. Looks like it's perfect for writing q-1? That is only in this scenario. The correct approach is to first convert Q into a single-byte type of pointer, such as Char. So the correct notation is (struct list_int*) ((char*) x-8).

But the problem is still not completely solved. In this case, it is certainly unacceptable for each use to personally determine the position of the member in the first few bytes. In fact, it is entirely possible to give this task to the compiler.

In the example above, I would like to know where the address of the list2 is relative to the entire list_int structure. Consider such a sentence: (struct list_int*) 0. Yes, turn 0 into a list_int address. Take a look at (struct list_int*) 0)--list2, a look at the illegal access, directly will be error statements. But if you change it, change it to & ((struct list_int*) 0--List2), which adds a parenthesis outside, preceded by a fetch address. This statement will not give an error, because here only to calculate how much this address, and did not really use this address to access memory. This address is actually very meaningful, it is easy to see, it is what we want, list2 relative to the entire list_int structure of the address, this is because an imaginary entire list_int starting address is located in 0, the structure of the list2 nature is located in Address 8.

So, we can write a macro: #define OFFSET (Type, member) (& (((type*) 0), member). No, it's wrong to write, check the whole statement behind it, what type is it? It is an address, to whom to take the address? to the member. Put in our example is the address of the List2, and List2 is the list_t type, so the actual result is a list_t address, but we want a number, indicating that this member in the end than the entire structure of the starting address of a few bytes, Because we are finally going to use an address and this number to do subtraction. So there is a mandatory type conversion outside, which translates it into unsigned int (because it is a 32-bit system, the address can be represented by the unsigned int, and if 64 bits is converted to a long long type.) But there seems to be a standard library that defines a size_t type that can take up to 4 bytes under a 32-bit system to account for 8 bytes in a 64-bit system, so this macro should look like this: #define OFFSET (Type, member) ((unsigned) (& (( type*) (0) (member))) (alas, the parentheses are so much ...) ), want to know list2 relative to the entire list_int address, call offset (struct list_int, list2) on it, and then how to access, the above has said. Or you can write another macro to do the rest.

Finally, simply discuss the performance problem, which looks like X-List2. Next you have to do an addition to get the entire list_int address, so the performance of the list is lower than the traditional one? To do this, performance has actually declined, although it can be ignored. But in fact, in another place, performance will improve. For example, the traditional linked list, if the "next" operation, if the next pointer is not at the beginning of the entire structure, it is necessary to do an addition, then take the address to access the next, and in this universal list does not need to do this addition.

The originally defined

struct List_int {
   int n;
   struct list_int* next;
   struct list_int* prev;
};

Each time you want to access the next item, after taking next, you get the start address of the list_int structure, and if I want to continue to take down an item, add a number (get next address), then take the address, then add more. This kind of universal list, because after taking next, get directly is the next item of next address, so there is no need to do that addition. However, if you do not take next but take Prev, then this addition is still indispensable.

A common list structure (from Linux source code)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More