http://blog.csdn.net/embeddedman/article/details/7429976
First, a program introduces the topic:
1//Environment: VC6 + Windows SP2
2//Procedure 1
3#include<iostream>
4
5UsingNamespaceStd
6
7structSt1
8{
9CharA
10 Intb;
11ShortC
12};
13
14structSt2
15{
16ShortC
17CharA
18Intb;
19};
20
21stIntMain ()
22{
23cout<<"sizeof (ST1) is"<<sizeof(ST1)<<Endl
cout<<"sizeof (ST2) is "<<sizeof(st2)<<Endl;
return 0 ;
+ }
-
The output of the program is:
sizeof (ST1) is 12
sizeof (ST2) is 8
The problem comes out, these two same structure, why does sizeof's time size different?
The main purpose of this article is to explain the problem.
Memory alignment is due to the effect of memory alignment, resulting in a different result.
For most programmers, memory alignment is basically transparent, which is what the compiler does, and the compiler arranges each data unit in the program in the right place, resulting in the same variables, differing in the structure size of the declaration order.
So why does the compiler have to do memory alignment? The structure in Program 1 understands that both sizeof (ST1) and sizeof (ST2) results should be 7,4 (int) + 2 (short) + 1 (char) = 7. After the memory is aligned, the space of the structure increases instead.
Before explaining the effect of memory alignment, consider the rules for memory alignment:
1. For each member of the struct, the first member is positioned at offset 0, and the offset for each data member must be a multiple of min (#pragma the number specified by the pack (), the data member's own length).
2. After the data members have completed their respective alignments, the structure (or union) itself is aligned, and the alignment will be performed according to the value specified by the #pragma pack and the maximum data member length of the structure (or union), whichever is smaller.
The #pragma pack (n) indicates that the alignment is set to n bytes. VC6 default 8-byte alignment
The rules for justification are explained in program 1 as an example:
St1:char takes one byte, the starting offset is 0, int is 4 bytes, min (#pragmapack () specified number, the data member's own length) = 4 (VC6 default 8-byte alignment), so int is aligned by 4 byte, and the starting offset must be a multiple of 4. So the starting offset is 4, and after char the compiler adds 3 bytes of extra bytes without storing any data. Short takes 2 bytes, is 2-byte aligned, has a starting offset of 8, and is exactly a multiple of 2, without adding extra bytes. To the end of the data member alignment for this rule 1, the memory status is:
Oxxx|oooo|oo
0123 4567 89 (address)
(x indicates extra bytes added)
A total of 10 bytes. Also to continue the alignment of the structure itself, the alignment will be performed in the number and structure (or union) maximum data member length specified by the #pragma pack, which is greater than the smaller one, the maximum data member length in the ST1 structure is int, which is 4 bytes, and the default #pragma pack specifies a value of 8. So the result itself is aligned in 4 byte, the total size of the structure must be a multiple of 4, add 2 extra bytes to make the total size of the structure is 12. The memory status at this time is:
Oxxx|oooo|ooxx
0123 4567 89ab (address)
To the end of this memory alignment. The St1 occupies 12 bytes rather than 7 bytes.
St2 's alignment method is the same as ST1, and the reader can do it by himself.
The main functions of memory alignment are:
1, platform reason (transplant reason): Not all hardware platform can access arbitrary data at any address; some hardware platforms can only fetch certain types of data at certain addresses, or throw hardware exceptions.
2, performance reasons: After the memory alignment, the CPU memory access speed greatly increased. Specific reasons are explained later.
Figure One:
This is the memory of the average programmer in the mind, composed of a byte, and the CPU is not so regarded.
Figure II:
The CPU is a piece of memory, the size of the block can be 2,4,8,16 byte size, so the CPU reads memory is a piece of reading. Block size becomes memory accessgranularity (granularity) I translated it into "ram read granularity".
Suppose the CPU is going to read an int type 4-byte data into a register, in two cases:
1. Data starting from 0 bytes
2. Data starting from 1 bytes
Again, assume that the memory read granularity is 4.
Might
When the data is starting from 0 bytes, the CPU simply reads the memory one time to fully read the 4 bytes of data into the register.
When the data is starting from 1 bytes, the problem becomes somewhat complex, when the int data is not on the memory read boundary, which is a class of memory unaligned data.
Figure IV:
At this point the CPU accesses the memory first, reads the 0-3 bytes of data into the register, reads the 4-5 bytes of data into the register again, and then rejects 0 bytes and 6,7,8 bytes of data, and finally merges the data into the 1,2,3,4 byte register. Doing so much extra work on a memory unaligned data greatly reduces CPU performance.
This is still optimistic, one of the above mentioned memory alignment is the reason for the migration of the platform, because only a portion of the CPU is willing to dry, the other part of the CPU encountered misaligned boundary on the direct strike.
C-language memory alignment.