These days to see the 13th chapter of the Code encyclopedia---Uncommon data types, which explain the C language of the struct and the interpretation of the pointer, Lenovo to the previous read about the C language in the stuct length of the article, but now some forgotten, so today to save the information to review again , at the same time write down this article, the previous relevant information summary and by the way comb the existing knowledge.
First, the length of the basic data type in memory
With respect to the basic data types, the lengths that occupy different machines are not the same. In order to be able to have a clear understanding of the combination types of data (only the struct and union are included here), here is a simple summary of the number of bits that the basic data types occupy on machines with different bits (32-bit and 64-bit). Simple data types are generally language built-in data types, commonly divided into: char, short, int, long, float, double, long double, longlong. Even though the system is 64-bit or 32-bit, it can still produce different results, mainly because of the compiler's differences. In a 64-bit VS2013 environment, the compiler is set to X32 and X64 two cases, resulting in 1 of the results shown; The compiler's environment cannot be set to 64 bits under a 32-bit system, at which point the running program will give an error, as shown by the number of bytes in the 32-bit machine with different types of data occupying 2.
Figure 1 (a) 64-bit system, setting the number of bytes occupied by the data type in the compiler bit 32-bit environment
Figure 1 (b) 64-bit system, setting the number of bytes that the compiler occupies for the data type in the X64 bit environment
Figure 2 The number of bytes occupied by the data type in a 32-bit machine system
Of course, if a long and an int in a different version of the 64-bit compiler are likely to occupy a length of 8 bytes, this depends on the compiler's environment. Another explanation for the different number of bytes occupied by different data types is shown in table 1:
|
LP64 |
ILP64 |
LLP64 |
ILP32 |
LP32 |
Char |
8 |
8 |
8 |
8 |
8 |
Short |
16 |
16 |
16 |
16 |
16 |
Int |
32 |
64 |
32 |
32 |
16 |
Long |
64 |
64 |
32 |
32 |
32 |
Float |
32 |
32 |
32 |
32 |
32 |
Double |
64 |
64 |
64 |
64 |
64 |
Long double |
64 |
64 |
64 |
64 |
64 |
Long Long |
N/A |
N/A |
64 |
N/A |
N/A |
Pointer |
64 |
64 |
64 |
32 |
32 |
Table 1 length of different data types in different environments
In this table, LP64,ILP64,LLP64 is a word length model in 64-bit environments, and ILP32 and LP32 are word length models in 32-bit environments. LP64 means long and pointer occupy 64 bits; ILP64 represents Int,long,pointer occupies 64; LLP64 is a long long and pointer occupy 64 bits, ILP32 represents Int,long and pointer occupy 32 bits, LP32 represents long and pointer occupy 32 bits.
The 32-bit Windows environment uses the ILP32 Word length model, and 64-bit Windows uses the LLP64 word length model, so that there is a case where only the pointer and the number of bytes occupied by the long long type are inconsistent in 64-bit and 32-bit environments.
It is also worth noting that, in a specific environment platform, the number of bytes that the pointer occupies is irrelevant to the data type, as far as the platform is concerned. This is because pointers only hold the starting address of a variable, and only pointers with specific types can be interpreted.
Second, the length of the struct
In languages that do not support class-based programming, the use of struct variables can provide the following benefits:
- Structure can clarify data relations;
- The structure simplifies the operation of the data block, such as assigning values;
- The structure simplifies the parameter list, and the formal parameter of the function can replace many formal parameters with the structure body.
- Structure can simplify the maintenance of the code difficult;
Before explaining the length of the struct, first describe the alignment concept of the data address in the computer. What is the alignment of the data address, and why is the address of the data aligned?
Alignment: The memory space in modern computers is divided by byte, and theoretically it seems that access to any type of variable can start at any address, but the reality is that it is often accessed at specific memory addresses when accessing specific variables. This requires that each type of data be cleared in space according to certain rules, instead of sequentially storing one after the other, which is aligned.
Cause: The processing of storage space varies greatly with each hardware platform. Some platforms can only access certain types of data from specific addresses. Other platforms may not have these, but the most common scenario is a loss of access efficiency if the data is not accessed in a manner that is appropriate for its platform requirements. For example, some platforms each read to understand the beginning of Oudichi, if an int (32-bit system) the storage address is from the Oudichi start, then a cycle can be read out, if it is stored at the beginning of the odd address, it may take two cycles to read out, obviously reduce the efficiency of reading. This is the game of space and time.
The Alignment algorithm:
Because of the different compilers, the 64-bit compiler is now an example of how alignment is done in a comb.
The structure is defined as follows:
struct a{
int A;
Char b;
Short C;
};
Struct A contains a 4 byte length int type, a 1 byte length char type, a two byte length short type, so the total valid storage space required is 7 bytes, but due to compiler alignment, sizeof (A) =8,3 is shown.
Figure 3 Length of structure a
However, if you change the order of the elements in the structure, the length of the structure changes somewhat, and the structure B is now redefined as:
struct b{
Char b;
int A;
Short C;
}
The same total effective storage space is 7, but at this point sizeof (B) =12,4 is shown.
Figure 4 Length of structure b
The default alignment is used above, and if you use precompiled instruction #pragma pack (value) to tell the compiler to use the alignment specified by the programmer instead of the default method, the length of the same struct will change again. If you add #pragma pack (2) before the definition of struct B, it becomes the following definition:
#pragma pack (2)//Specify two-byte alignment
struct c{
Char b;
int A;
Short B;
}
#pragma pack ()//cancels the specified alignment, restores the default
At this point, sizeof (C) is shown in =8,5.
Figure 5 Specifying the alignment bit 2 bytes after the length of the struct body
If you modify the alignment to 1 bytes, that is, #pragma pack (2) is replaced with #pragma pack (1), then sizeof (C) =7,6 is shown.
Figure 6 Specifying the structure body length after the alignment is 1 bytes
In order to understand the above changes, it is now necessary to define four concepts:
- The self-aligning value of the data type: is the number of bytes that the data type occupies in memory;
- Specifies the alignment value specified for its value:#pragma pack (value);
- The self-aligning value of the struct: the value that is the largest of its own alignment value in the member variable;
- a valid alignment value for a data member, struct: its own alignment value and the smaller value in the specified alignment value;
With these concepts we can easily discuss the alignment of the members of the specific data structure and the structure itself. The following paragraph, which illustrates the alignment, is a critical step in determining the length of the structure.
A valid alignment value is the value that is ultimately used to determine how the data is stored, most importantly. Valid alignment value n, aligned on n basis. In other words, the data storage address should meet the "storage address%n=0". The data variables of a struct are stored in the order in which they are defined, and the starting address of the first data variable is the starting address of the structure. The structure of the variable to align the emissions, the structure itself should be adjusted according to its own effective alignment values (that is, the structure member variables occupy the total length should be the structure of the effective alignment is worth an integer times).
In this way, it is not difficult to understand why the length of the structure listed above is different. Take struct B as an example for analysis. Assuming that the B address is stored from the address space 0x0000, this example does not specify its value, in the author's compiler environment, the default is the value of the member variable that occupies the maximum length in B, which is the self-aligning value of an int that occupies four bytes. The first member variable B itself has a value of 1, which is less than the default alignment value of 4, so its valid alignment value is 1, so its storage address 0x0000 conforms to 0x0000%1=0. The second member variable A, with its own alignment value of 4, and a valid alignment value of 4, can only be stored in four contiguous byte spaces with a starting address of 0x0004 to 0x0007, conforming to 0x0004%4=0, and immediately after the first member variable. The third member variable, C, has its own alignment value of 2, the default alignment value is 4, so the valid alignment value is 2, which can be placed in the 0x0008 to 0x0009 space range. Then look at the data structure B's own alignment value for its variable maximum alignment value, that is, 4, so the structure of the effective alignment value is 4, according to the structure of the adjustment requirements, from 0x000a to 0x000b should also be occupied by the structure of the body B. So b occupies 12 bytes from 0x0000 to 0x000b, so sizeof (b) =12.
The same argument can be analyzed for struct C. C uses the precompiled Directive #pragma pack (2), specifying the alignment value to 2. For the first member variable B, its own alignment value is 1, which specifies a value of 2, so a valid alignment value is 1. Also assuming that C starts from 0x0000, then B is stored at the beginning of the 0x0000, in accordance with 0x0000%1=0, the second variable, its own alignment value is 4, the specified alignment value is 2, all valid alignment values are 2, so the order is stored in 0x0002, 0x0003, 0x0004, 0x0005 in four contiguous bytes, conforming to 0x0002%2=0. The third variable, C, has its own alignment value of 2, specifying a value of 2, so the valid alignment value is 2, which is stored sequentially. In 0x0006 and 0x0007, the variable c is stored in the same manner as 0x0006%2=0. So from 0x0000 to 0x0007 a total of eight bytes is stored in the C variable. And because C has its own alignment value of 4, C is valid for its value 2, while 8%2=0,c occupies only 8 bytes of 0x0000-0x0007 space.
It is also necessary to note that the structure of a combination of use, that is, the structure of the body contains another structure. The internal structure is to be stored from an integer multiple address of its internal maximum element size (e.g. struct e contains struct F, and struct F contains elements such as char,int,double, then f should be stored from an integer multiple of 8). Examples are as follows:
struct F{char a;int b;double c;}; struct E{char b;int a;short c;struct F obj;char E;};
At this point, sizeof (E) = 40.
Figure 7 Size of the composite structure
First, let's assume that struct E starts with 0x0000, and according to the analysis of B and C, it can get the self-aligning value of each variable, the valid alignment value, and the address as shown in table 2. It is important to note that the largest of the elements in struct E is the value of C of the element in struct F, which is 8.
|
Self-aligning values |
Valid alignment values |
Start Address |
e.b |
1 |
1 |
0x0000 |
E.a |
4 |
4 |
0x0004 |
E.c |
2 |
2 |
0x0008 |
E.f.a |
1 |
1 |
0x0010 |
e.f.b |
4 |
4 |
0x0014 |
E.f.c |
8 |
8 |
0x0018 |
E.e |
1 |
1 |
0x0020 |
Table 2 The values of the different elements, their valid alignment values, and the starting address
The last element e.e the starting address is 0x0020, so 0x0020 holds e.e, but the effective alignment value of struct E is 8, according to the principle of adjustment, the space between 0x0021 and 0x0028 is also occupied by E, so the total space size of E is 40.
III. Union (Union)
A consortium is a structure that uses relatively few uses. All variables in a union share the same memory location, saving different data types and variables of different lengths at different times. In Union, all union members share a space and only the value of one of the member variables can be stored at the same time. When a union is declared, the compiler automatically generates a variable whose length is an integer multiple of the length of the variable of the largest type in the Union, and is greater than or equal to the storage space occupied by its maximum member variable.
Union G{char name[30];d ouble al;char sex;int age;float height;};
At this point, G needs to occupy the largest space is the element char name[30], but the largest space in the meta-type is double, which is 8 bytes. But the consortium G needs to be an integer multiple of Al and greater than or equal to 30, so sizeof (g) =32,8 is shown.
Figure 8 Size of the consortium
A single consortium is easier to understand, but there are two cases where the combined body weight contains the structure and the structure contains a consortium, respectively, described below.
- A consortium contains a struct.
When a consortium contains a struct, its method of determination is similar to that of a consortium's individual consortium. In fact, an array in a union can be seen as a special structure in which all elements in the struct have the same type. is still the maximum value of all the elements in the Union, and the complement method is still an integer multiple of the maximum length of the variable of the meta-type. For example:
struct Inner{char a;double B;char c;}; Union data{struct Innner A;int B;char c;};
where sizeof (inner) = 24, is the most space-occupying element in data, and the element with the largest footprint is B, which is 4 bytes, thus eliminating the need for additional completion. So sizeof (inner) = 24.
Figure 9 The consortium contains the size of the struct body
2. The structure contains a consortium
When a union is included in a struct, the alignment address of the union in the struct is the alignment used within the Union itself. For example:
Union H{int A;int Array[5];char c;}; struct I{int a;short b;union H c;char d[5];};
According to the introduction to the union, the size of the Union H sizeof (h) = 20; in struct I, A and b occupy the first 8 bytes, and C should begin with an integer multiples of 4, so sizeof (I) =4+2+2 (padded) +20+5+3 (padded) = 36. The size on the computer and the individual elements are shown in the fact address 10.
Figure 10 the size of the struct containing the Union and the starting address of each element
Summarize:
1. Unions and structs have several different data-type data members, but at the same time, the Union stores only one selected member and all the members in the struct exist;
2. For the different member assignment of the Union, the other variables will be rewritten, the value of the original variable does not exist, and for the structure of the different members of the assignment, each other is not affected;
3. The size of the memory space occupied by different types of members is the basis for understanding the size of the struct;
A detailed description of the struct and union lengths in C