Reprint Please keep the following statement
Zhaozong
Source: https://www.cnblogs.com/zhao-zongsheng/p/9099603.html
Many people who write C + + know the concept of "memory alignment" and the rules, but don't necessarily have a deep understanding of it. This article tries to tell the memory alignment of C + + more thoroughly from hardware to C + + language.
What is memory alignment (alignment)
First, what is memory alignment (alignment)? This is a concept that comes from the hardware level. As we all know, the executable program is made up of a series of CPU instructions. There are some instructions in the CPU instructions that require access to memory. The most common is "read from memory to register" and "Write from register to memory". In older architectures (including x86), there are also instructions for operations that can be manipulated directly into memory, and these instructions also imply memory reads. In many CPU architectures, these instructions require the memory address of the operation (more precisely, the starting address of the operating memory) to be divisible by the memory size of the operation, and the memory access that satisfies this requirement is called Access-aligned Memory (aligned Memories Access), Otherwise, access is not aligned (unaligned memory access). For example, ARM's LDRH instruction reads 2 bytes from memory into the register. If the specified memory address is 0X2587C20, because the 0X2587C20 number can be divisible by 2, the 2 bytes are aligned. If the address of the specified memory is 0X2587C33 because it cannot be divisible by 2, it is misaligned.
What happens if you access misaligned memory? This depends on the CPU.
- Some CPU architectures can access misaligned memory, but have a performance impact. The typical x86 architecture is CPU
- Some CPUs will throw an exception
- Some CPUs will not throw any exceptions and will silently access the wrong address
- In recent years, some of the CPU's instructions can normally access misaligned memory without any performance impact.
Because each CPU accesses unaligned memory differently, access to unaligned memory is avoided as much as possible. Therefore, the memory alignment mechanism of C + + is present.
Memory alignment mechanism for C + +
In C + +, each type has two properties, one is size, and one is the alignment requirement (alignment requirement), or the alignment amount (alignment). The C + + standard does not specify the amount of justification for each type, but it is generally the rule.
- The amount of justification for all underlying types equals the size of this type.
- The alignment of a struct, class, union type equals the maximum amount of alignment in his non-static member variable.
In addition, the standard specifies that all alignments must be a power of 2.
The compiler calculates and satisfies this type of alignment requirement when allocating memory to a variable. The byte-number offset (offset) of a non-static member variable of a struct and class type also satisfies the alignment requirements of the respective type.
For example,
class myobject{ char C; int i; Short s;};
C is the char type, the alignment requirement is 1,i is the int type, the alignment requirement is 4,s is the short type, the alignment requirement is 2. So MyObject takes the biggest, that is, 4 as his alignment requirements. If a variable of type MyObject is declared in a function, the starting address of the memory allocated to the variable can be divisible by 4.
Let's look at the member variables of MyObject. C is the first member variable of MyObject, so his byte offset is 0, which means the variable C occupies the first byte of MyObject. The alignment requirement for I is 4, so the byte-count offset must be a multiple of 4, and since the variable I must be behind the variable C, the byte-number offset of I is 4, which means that the variable I occupies the 5th to 8th byte of MyObject, and 2nd to 4th byte is a blank fill (padding). The alignment requirement for S is 2, and since s must be behind I, the byte offset of S is 8, that is, the variable s occupies the 9th and 10th byte of MyObject. In addition, because each element of the struct, class, union type array is memory-aligned, the size of the struct, class, and union is generally the integer multiples of this type of alignment, so the size of the MyObject is 12, which means that A blank padding of 2 bytes is followed by the variable S.
Because all memory accesses in C + + are accessed through the read and write of variables, this mechanism ensures that all variables satisfy the memory alignment and ensures that all memory accesses in the program are aligned.
Of course, C + + does not prevent us from accessing unaligned memory. For example, the following code is likely to access unaligned memory:
Char buf[]; int* ptr = (int1); ++*ptr;
This type of code is something that we can meet in the actual work. In fact, this is a dangerous way of writing, because he is likely to visit misaligned memory. This is why write C + + people do not recommend the type of C-style conversion, but to use static_cast, dynamic_cast, Const_cast and reinterpret_cast. In this case, the above code will have to use reinterpret_cast, we all know that reinterpret_cast is very dangerous, perhaps will try to avoid such logic.
Non-aligned memory access for common CPUs
Intel 64 and IA-32 architectures support unaligned memory access, but with additional performance overhead (see Http://www.intel.com/products/processor/manuals), based on Intel's latest Intel 64 and IA-32 architecture specifications. But, in fact, the closest core series CPUs have been able to access unaligned memory without additional overhead.
The most common ARMV8 architecture on a cell phone, if it is an unaligned memory access that is normal and does not do long-core synchronization, the CPU may generate an alignment error (alignment fault) or perform an unaligned memory operation. In other words, whether the error or normal execution, is to see the implementation of the specific CPU. There are some limitations, even if you are performing normal operations. For example, there is no guarantee of the atomicity of Read and write (except for the operation of a byte), which is likely to incur additional overhead (see https://developer.arm.com/docs/ddi0487/latest/ Arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile). The Cortex-a series in ARMV8 is a common CPU family on mobile phones, and they can normally handle unaligned memory access, but there is typically additional overhead (see http://infocenter.arm.com/help/index.jsp?topic=/ com.arm.doc.faqs/ka15414.html).
We can also write a simple program to test your CPU's support for unaligned memory access, and here's the code:
#include <iostream>#include<chrono>using namespacestd;using namespacestd::chrono;milliseconds test_duration (volatile int* ptr)//use volatile pointers to prevent compiler optimizations{Auto Start=Steady_clock::now (); for(Unsigned i =0; I < -' the' the; ++i) {++(*ptr); } Auto End=Steady_clock::now (); returnDuration_cast<milliseconds> (End-start);}intMain () {intraw[2] = {0,0}; { int* ptr =Raw; cout<<"Address of aligned pointer:"<< (void*) PTR <<Endl; cout<<"aligned access:"<< test_duration (PTR). Count () <<"Ms"<<Endl; *ptr =0; } { int* ptr = (int*)(((Char*) Raw) +1); cout<<"address of unaligned pointer:"<< (void*) PTR <<Endl; cout<<"unaligned access:"<< test_duration (PTR). Count () <<"Ms"<<Endl; *ptr =0; } cin.Get(); return 0;}
I tested the CPU of the computer used by Intel Core i7 2630QM, which is the Intel 2 Daicouri CPU, and the test results are:
Address of aligned pointer:000000668deffa78aligned access:282msaddress of unaligned pointer:000000668deffa79unaligned Access:285ms
You can see that there is no performance difference between aligning and unaligned memory access.
To modify alignment requirements in C + +
In general, we do not need to customize the alignment requirements, but there will be special circumstances that need to be adjusted. In C + +, we can use the Alignas keyword to modify the alignment requirements of a type, or a variable. For example:
class myobject{ char C; Alignas (8int i; Short s;};
In this case, the alignment requirement for the variable i is changed from 4 to 8, and the result is that the byte number of I is shifted from 4 to the 8,s byte number offset from 8 to 12,myobject the alignment requirement becomes 8 and the size becomes 16.
We can also use Alignas for the definition of MyObject:
Class Alignas (+) myobject{ char C; int i; Short s;};
You can also write a type in Alignas. Multiple Alignas can also be used, resulting in the use of the maximum alignment requirements. For example, the alignment requirement for the following myobject is 16:
class Alignas (int) Alignas (+) myobject{ char C; int i; Short s;};
Alignas has one limitation, that is, it is not possible to change small alignment requirements with Alignas. For example, the following code will error:
Alignas (1int i;
In addition, C + +, there is a special type: max_align_t, all is not greater than his alignment amount is called the Base alignment amount (fundamental alignment), is larger than this alignment amount is called the expansion alignment amount (extended alignment). The C + + standard stipulates that all platforms must support base alignment, while support for extended alignments depends on each platform. Generally speaking, the alignment amount of max_align_t is equal to the alignment of long double.
C + + memory alignment support There are many features, such as query alignment of the ALIGNOF keyword, you can create any size arbitrary alignment requirements of the type of aligned_storage template, as well as convenient template programming alignment_of, and so on, is not described in detail.
Hardware-to-language, detailed C + + memory alignment (alignment)