C-byte alignment

Source: Internet
Author: User

At the end of the article, I made a picture and understood it at a Glance. I have talked a lot about this issue on the Internet, but I have not thoroughly explained it.

I. Concepts

Alignment is related to the location of data in the memory. If the memory address of a variable is exactly an integer multiple of its length, it is called a natural alignment. For example, if the address of an integer variable is 0x00000004 under a 32-bit cpu, it is naturally aligned.

Ii. Why byte alignment

The root cause of the need for byte alignment is the efficiency of CPU access to data. Assume that the address of the preceding integer variable is not naturally aligned. For example, if it is 0x00000002, the CPU needs to access the memory twice if it obtains its value. The first time it takes a short from 0x00000002-0x00000003, take a short from 0x00000004-0x00000005 for the second time and combine it to obtain the desired data. If the variable is on the 0x00000003 address, access the memory three times. The first time is char, and the second time is short, the third time is char, and then the integer data is combined. If the variable is in the natural alignment position, the data can be retrieved once. Some systems have strict requirements on alignment. For example, an error occurs when un-alignment data is obtained in a Linux instance. For example:

Char ch [8];
Char * p = & ch [1];
Int I = * (int *) p;

  
A segment error is reported during running, but no error occurs on x86, but the efficiency is reduced.
  
3. Correct Handling of byte alignment
  
For a standard data type, its address only needs to be an integer multiple of its length, while the non-standard data type is aligned as follows:
  
Array: Alignment Based on the basic data type. The first alignment is followed by the natural alignment.
Union: Alignment Based on the Data Type with the maximum length.
Struct: each data type in the struct must be aligned.
For example, the following struct:
  
Struct stu {
Char sex;
Int length;
Char name [10];
};
Struct stu my_stu;

  
In x86, GCC is 4-byte aligned by default. It will fill three and two bytes after sex and name to align length with the entire struct. So we get the length of sizeof (my_stu) is 20, not 15.
  
Iv. _ attribute _ options
  
We can compile the program according to the configured alignment size. GNU uses the _ attribute _ option to set it. For example, if we want to make the structure aligned by one byte, we can define the struct in this way.
  
Struct stu {
Char sex;
Int length;
Char name [10];
}__ Attribute _ (aligned (1 )));
  
Struct stu my_stu;

  
Then sizeof (my_stu) can get the size of 15.
  
The above definition is equivalent
  
Struct stu {
Char sex;
Int length;
Char name [10];
}__ Attribute _ (packed ));
Struct stu my_stu;

  
_ Attribute _ (packed) requires the smallest alignment mode for variables or struct members, that is, one-byte alignment for variables and bit alignment for fields.
  
5. When do I need to set alignment?
  
When designing communication protocols for different CPUs or writing hardware drivers, the register structure must be aligned in one byte. Even if it seems that the code is naturally aligned, alignment should be made to avoid different code generated by different compilers.

 

1. quick understanding

1. What is byte alignment?

In the C language, the structure is a composite data type, and its components can be both variables of basic data types (such as int, long, float, and so on, it can also be a data unit of a composite data type (such as an array, structure, and union. In the structure, the compiler allocates space for each member of the structure according to its natural boundary (alignment. Each member is stored in the memory in the declared order. The address of the first member is the same as that of the entire structure.

To enable the CPU to quickly access variables, the starting address of the variable should have some characteristics, that is, the so-called "alignment ". for example, the starting address of a 4-byte int type should be located at the boundary of 4 bytes, that is, the starting address can be divisible by 4.

2. What is the role of byte alignment?

Byte alignment not only facilitates fast cpu access, but also effectively saves storage space by making rational use of byte alignment.

For 32-bit machines, the 4-byte alignment can increase the cpu access speed. For example, if a long type variable spans 4-byte boundary storage, the cpu needs to read twice, this reduces the efficiency. However, the use of 1-byte or 2-byte alignment on 32-bit servers reduces the variable access speed. Therefore, we need to consider the processor type and the compiler type. In vc, the default value is 4-byte alignment, and GNU gcc is also 4-byte alignment by default.

3. Change the default byte alignment of the C compiler.

By default, the C compiler allocates space for each variable or data unit based on its natural limitations. Generally, you can use the following method to change the default peer condition:
· With the pseudo command # pragma pack (n), the C compiler will align according to n Bytes.
· Use the pseudo command # pragma pack () to cancel the custom byte alignment.

In addition, the following method is provided:
· _ Attribute (aligned (n) to align the structure members to the natural boundary of n Bytes. If the length of a member in the structure is greater than n, the maximum member length is used for alignment.
· _ Attribute _ (packed): cancels the optimization alignment of the structure during compilation and alignment according to the actual number of bytes occupied.

4. Examples

Example 1

Struct test
{
Char x1;
Short x2;
Float x3;
Char x4;
};

By default, the compiler performs natural boundary (Some people say "natural to the boundary" I think the boundary is more fluent) alignment on this struct. the first member of the structure is x1, the offset address is 0, occupying 1st bytes. The second member x2 is of the short type, and its starting address must be two byte pairs. Therefore, the compiler fills a Null Byte between x2 and x1. The third and fourth members of the structure exactly fall on their natural boundary addresses, and no additional bytes are needed before them. In the test structure, member x3 requires a 4-byte bounded boundary and is the maximum boundary unit required by all members of the structure. Therefore, the natural boundary condition of the test structure is 4 bytes, the compiler fills in three NULL bytes after Member x4. The entire structure occupies 12 bytes of space.

Example 2

# Pragma pack (1) // Let the compiler align this structure in 1 byte
Struct test
{
Char x1;
Short x2;
Float x3;
Char x4;
};
# Pragma pack () // cancel the 1-byte alignment and restore to the default 4-byte alignment

At this time, the value of sizeof (struct test) is 8.

Example 3

# Define GNUC_PACKED _ attribute _ (packed ))
Struct PACKED test
{
Char x1;
Short x2;
Float x3;
Char x4;
} GNUC_PACKED;

At this time, the value of sizeof (struct test) is still 8.

2. In-depth understanding

What is byte alignment? Why?
TragicJun was published on 9:41:00 in modern computers. The memory space is divided by byte. Theoretically, it seems that access to any type of variables can start from any address, however, the actual situation is that access to specific types of variables is often performed at specific memory addresses, which requires various types of data to be arranged in space according to certain rules, instead of sequential emissions, this is alignment.
Alignment functions and causes: the processing of storage space varies greatly by hardware platform. Some platforms can only access certain types of data from some specific addresses. For example, some architectures may encounter errors when the CPU accesses a variable that is not aligned, so in this architecture, programming must ensure byte alignment. this may not be the case for other platforms, but the most common problem is that alignment of data storage according to the requirements of their platforms may cause a loss of access efficiency. For example, some platforms start from the even address each time they read data. If an int type (assuming a 32-bit System) is stored at the beginning of the even address, the 32bit can be read in a read cycle, if the data is stored at the beginning of the odd address, two read cycles are required, and the high and low bytes of the two read results are pieced together to obtain the 32bit data. Obviously, reading efficiency is greatly reduced.
Ii. Effect of byte alignment on programs:

Let's take a few examples (32bit, x86 environment, gcc compiler ):
The struct is defined as follows:
Struct
{
Int;
Char B;
Short c;
};
Struct B
{
Char B;
Int;
Short c;
};
The length of various data types on 32-bit machines is known as follows:
Char: 1 (signed and unsigned)
Short: 2 (signed and unsigned)
Int: 4 (signed and unsigned)
Long: 4 (signed and unsigned)
Float: 4 double: 8
What is the size of the above two structures?
The result is:
The sizeof (strcut A) value is 8.
The value of sizeof (struct B) is 12.

Struct A contains A four-byte int, A one-byte char, and A two-byte short data. The same applies to B., B must be 7 bytes in size.
The above result is displayed because the compiler needs to align data members in space. The above is the result of alignment according to the default settings of the compiler. Can we change the default alignment settings of the compiler, of course? For example:
# Pragma pack (2)/* specify to align by 2 bytes */
Struct C
{
Char B;
Int;
Short c;
};
# Pragma pack ()/* cancel the specified alignment and restore the default alignment */
The value of sizeof (struct C) is 8.
Modify the alignment value to 1:
# Pragma pack (1)/* specify to align by 1 byte */
Struct D
{
Char B;
Int;
Short c;
};
# Pragma pack ()/* cancel the specified alignment and restore the default alignment */
The sizeof (struct D) value is 7.
Next we will explain the role of # pragma pack.

Iii. What principles does the compiler align?

Let's take a look at four important basic concepts:


1. Alignment of data types:
For char data, its own alignment value is 1, for short data is 2, for int, float, double type, its own alignment value is 4, in bytes.
2. The alignment value of a struct or class: The value with the largest alignment value among its members.
3. Specify the alignment value: # The alignment value specified when pragma pack (value) is used.
4. Valid alignment values of data members, struct, and classes: the alignment value of the data itself and the value smaller than the specified alignment value.
With these values, we can easily discuss the data structure members and their alignment. The valid alignment value N is the final value used to determine the data storage address. Valid alignment means "alignment on N", that is, the "Starting address for storing the data % N = 0 ". data variables in the data structure are discharged in the defined order. The starting address of the first data variable is the starting address of the data structure. The member variables of the struct must be aligned and discharged, and the struct itself must be rounded according to its own valid alignment values (that is, the total length occupied by the member variables of the struct must be an integer multiple of the valid alignment values of the struct, ). In this way, you cannot understand the values of the above examples.
Example Analysis:
Analysis example B;
Struct B
{
Char B;
Int;
Short c;
};
Assume that B is discharged from the address space 0x0000. The alignment value is not defined in this example. In the author's environment, this value is 4 by default. The first member variable B's own alignment value is 1, which is smaller than the specified or default alignment value 4. Therefore, the valid alignment value is 1, therefore, the storage address 0x0000 is 0 x 0000% 1 = 0. the alignment value of the second member variable a is 4, so the valid alignment value is 4. Therefore, it can only be stored in the four consecutive bytes from the starting address 0x0004 to 0x0007, review 0 x 0004% 4 = 0, which is close to the first variable. The third variable c has its own alignment value of 2, so the valid alignment value is also 2, which can be stored in the two bytes from 0x0008 to 0x0009, Which is 0 x 0008% 2 = 0. Therefore, B content is stored from 0x0000 to 0x0009. Then, let's look at the alignment value of Data Structure B as the maximum alignment value in its variable (here it is B), so it is 4, so the valid alignment value of the structure is also 4. According to the requirements of the structure, 0x0009 to 0x0000 = 10 bytes, (10 + 2) % 4 = 0. Therefore, 0x0000A to 0x000B is also occupied by struct B. Therefore, B has a total of 12 bytes from 0x0000 to 0x000B, and sizeof (struct B) = 12. In fact, if this one is used, it will satisfy the byte alignment, because its starting address is 0, it must be aligned. The reason why two bytes are added to the end is that the compiler aims to achieve the access efficiency of the structure array, imagine if we define an array of structure B, the starting address of the first structure is 0, but what about the second structure? According to the definition of the array, all elements in the array are adjacent. If we do not add the size of the structure to an integer multiple of 4, the starting address of the next structure will be 0x0000A, this obviously cannot satisfy the address alignment of the structure, so we need to add the structure to an integer multiple of the valid alignment size. in fact, for char type data, its own alignment value is 1, for short type is 2, for int, float, double type, its own alignment value is 4, the alignment values of these existing types are also based on arrays, but their alignment values are also known because their lengths are known.
Similarly, analyze the above example C:
# Pragma pack (2)/* specify to align by 2 bytes */
Struct C
{
Char B;
Int;
Short c;
};
# Pragma pack ()/* cancel the specified alignment and restore the default alignment */
The first variable B's own alignment value is 1 and the specified alignment value is 2. Therefore, the valid alignment value of B is 1. Suppose C starts from 0x0000, then B is stored in 0x0000, conforms to 0 x 0000% 1 = 0; the second variable, its own alignment value is 4, and the specified alignment value is 2, so the valid alignment value is 2, therefore, the sequence is stored in four consecutive bytes, namely 0x0002, 0x0003, 0x0004, and 0 x 0002%. The alignment value of the third variable c is 2, so the valid alignment value is 2, which is stored in sequence.
In 0x0006, 0x0007, 0 x 0006% 2 = 0. Therefore, from 0x0000 to 0x00007, a total of eight characters are stored in the C variable. And C's own alignment value is 4, so the valid alignment value of C is 2. Again 8% 2 = 0, C only occupies eight bytes from 0x0000 to 0x0007. So sizeof (struct C) = 8.

4. How to modify the default alignment value of the compiler?

1. in vc ide, you can modify it as follows: [Project] | [Settings], in Struct Member Alignment of Code Generation option of c/c ++ tab Category. The default value is 8 bytes.
2. You can modify the code dynamically as follows: # pragma pack. Note: It is pragma instead of progma.

5. How should we consider byte alignment in programming?
If we want to save space during programming, we only need to assume that the first address of the structure is 0, and then sort the variables according to the above principles, the basic principle is to declare the variables in the structure according to the type size from small to large, and minimize the space to fill. another way is to take the space for the efficiency of time, we show to fill the space for alignment, for example, there is a way to use the space for time is to explicitly insert reserved members:
Struct {
Char;
Char reserved [3]; // use space for time
Int B;
}

The reserved member has no significance for our program. It just fills the space to achieve byte alignment. Of course, even if this member is not added, the compiler will automatically fill the alignment for us, we add it as an explicit reminder.

6. potential risks of byte alignment:

Many of the potential alignment risks in the Code are implicit. For example, in forced type conversion. For example:
Unsigned int I = 0x12345678;
Unsigned char * p = NULL;
Unsigned short * p1 = NULL;

P = & I;
* P = 0x00;
P1 = (unsigned short *) (p + 1 );
* P1 = 0x0000;
The last two sentences of code access the unsignedshort variable from the odd boundary, which obviously does not comply with the alignment rules.
On x86, similar operations only affect the efficiency, but on MIPS or iSCSI, they may be an error because they must be in byte alignment.

7. How to find problems with byte alignment:

If alignment or assignment occurs, first check
1. Compiler's big little side settings
2. Check whether the system supports non-alignment access.
3. If alignment settings are supported, if not, some special modifications are required to mark its special access operations.

Example:

[Cpp]View plaincopy

  1. # Include <stdio. h>
  2. Main ()
  3. {
  4. Struct {
  5. Int;
  6. Char B;
  7. Short c;
  8. };
  9. Struct B {
  10. Char B;
  11. Int;
  12. Short c;
  13. };
  14. # Pragma pack (2)/* specify to align by 2 bytes */
  15. Struct C {
  16. Char B;
  17. Int;
  18. Short c;
  19. };
  20. # Pragma pack ()/* cancel the specified alignment and restore the default alignment */
  21. # Pragma pack (1)/* specify to align by 1 byte */
  22. Struct D {
  23. Char B;
  24. Int;
  25. Short c;
  26. };
  27. # Pragma pack ()/* cancel the specified alignment and restore the default alignment */
  28. Int s1 = sizeof (struct );
  29. Int s2 = sizeof (struct B );
  30. Int s3 = sizeof (struct C );
  31. Int s4 = sizeof (struct D );
  32. Printf ("% d \ n", s1 );
  33. Printf ("% d \ n", s2 );
  34. Printf ("% d \ n", s3 );
  35. Printf ("% d \ n", s4 );
  36. }

Output:

8

12

8

7

 

Modify the Code:

Struct {
// Int;
Char B;
Short c;
};

Struct B {
Char B;
// Int;
Short c;
};

Output:

4

4

The output value is 4, indicating that the previous int affects alignment!

See the picture to understand

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.