Each data type has a queue associated with it, which is licensed by the processor architecture and not by the language itself. Calibration data elements allow the processor to fetch data from memory in an efficient manner and thereby improve performance. To provide the best performance, the compiler tries to keep the queue for this data element. On 32-bit and 64-bit Linux systems, the typical alignment requirements for data types used on the Intel C + + compiler are as follows:
Data Type |
32-bit (bytes) |
64-bit (bytes) |
Char |
1 |
1 |
Short |
2 |
2 |
Int |
4 |
4 |
Long |
8 |
8 |
Float |
4 |
4 |
Double |
8 |
8 |
Long Long |
8 |
8 |
Long double |
4 |
16 |
Any pointer |
4 |
8 |
In general, the compiler will meet the alignment requirements of these data elements whenever possible. With the Intel C + + and Fortran compilers, you can use the-align (C/c++,fortran language) compiler switch to force or disallow natural alignment rules. For structures that typically contain different types of data elements, the compiler attempts to align the data elements that are persisted by inserting unused storage between the elements. This technique is called "padding". In addition, the compiler aligns the entire structure with its most stringent alignment member as a benchmark. The compiler may also increase the size of the structure, and when necessary, the compiler will multiply its implementation by adding padding at the end of the structure. This is called a "tail fill". As a result, populating the hospital with wasted storage space increases performance. In the case of an Intel Xeon Phi Coprocessor, the amount of storage available to the application itself is limited, which poses a serious problem.
best-in-breed design: minimizes memory waste
Developers can minimize this waste of memory by ordering the structure elements so that the largest/widest elements are in front, then the second wide, and then in turn. The following example can illustrate how the spatial size of a structure affects the ordering of data elements:
650) this.width=650; "alt=" code data Structure "class=" Img-thumbnail "src=" http://image.evget.com/images/article/2014/ Data-coding-1281.png "/>
The structure S1 has 11 padding bytes, as shown in the following table:
650) this.width=650; "alt=" code data Structure "class=" Img-thumbnail "src=" http://image.evget.com/images/article/2014/ Data-coding-1282.jpg "/>
Look at the following structure S2:
650) this.width=650; "alt=" code data Structure "class=" Img-thumbnail "src=" http://image.evget.com/images/article/2014/ Data-coding-1283.png "/>
This structure contains only 3 tail-filled bytes, as shown in:
650) this.width=650; "alt=" code data Structure "class=" Img-thumbnail "src=" http://image.evget.com/images/article/2014/ Data-coding-1284.jpg "/>
This saves memory. Therefore, it is possible to avoid memory wastage by simply rearrangement the data elements in the structure definition.
Best Design: Touch only a few elements at a time
One exception to this sort of element is that if your structure is larger than your cache line (64 bytes on the Intel Xeon Phi Coprocessor), some loops or cores will only be exposed to part of the structure. In this case, it may be beneficial to keep parts of the structure in memory, which may improve cache locality.
Best design: Decompose larger structures
If your structure is larger than the cache line, and some loops and cores can only touch one part of the structure, you can consider the smaller structures that are stored in separate permutations by decomposing large structures into multiple ones. This potentially increases the density of the data that can be contacted, and incident improves the locality of the cache.
Best Design: Force alignment of specific elements
You can also use the _decipsec (Align) property to instruct the compiler to align the data more closely than other methods, the syntax for this extended property is as follows:
C + +:
_decipsec (Align (n)) < data type declaration >
Fortran:
Cdec$atributes align:n::< data Type declaration >
Here n is the required queue, which is 2 of the maximum, 4096 in the Intel C + + compiler, and the largest in the Intel Fortran compiler is 16384. You can use this property for a single variable, static structure or automatic storage for the duration of the request alignment. However, this means that although you improve the consistency of the structure, this property does not adjust the alignment of elements within the structure. By placing _declpsec (align) in front of the keyword struct, you request an appropriate alignment for just this type of object. Let me illustrate my point with the following example:
650) this.width=650; "alt=" code data Structure "class=" Img-thumbnail "src=" http://image.evget.com/images/article/2014/ Data-coding-1285.png "/>
In the above example, the alignment of the character A2 and the integer b2 remains each 1 bytes and 4 bytes, which is the default. However, each instance of a struct S2 is aligned to a 32-byte boundary, as described in the _declspsec declaration. Therefore, the structure of the structure S1 internal S2 Each instance is aligned to a 32-byte boundary.
Best Design: Dynamic allocation of memory alignment
We can further extend this example by dynamically allocating the arrangement of the Structure S2:
650) this.width=650; "alt=" code data Structure "class=" Img-thumbnail "src=" http://image.evget.com/images/article/2014/ Data-coding-1286.png "/>
In this case, you still need to use _MM_MALLOC or a Portable Operating system interface (POSIX) that is equivalent to assigning aligned memory to the pointer, but by using _declspec (align (32)), You just want to force the alignment to 32 bytes for each element in the permutation arr1.
best-of-breed design: Use Align (n) and structs to enforce cache locality for small data elements
You can also use this data alignment support to provide the benefits of using optimization for cache lines. By aggregating small objects that are often used together into a structure and forcing the structure to allocate memory from the start of the cache line, you can effectively guarantee that every object will be loaded into the cache in a timely manner when needed, which can have a noticeable performance boost. For example, considering the two frequently called variables I and J, they may be assigned to different tell cache lines. You can declare them as follows:
650) this.width=650; "alt=" code data Structure "class=" Img-thumbnail "src=" http://image.evget.com/images/article/2014/ Data-coding-1287.png "/>
By declaring variables in this way, the compiler can ensure that the variables are assigned to the same cache line.
HPE offers a "Genuine IDE Joint Promotion program" with a range of Ides as low as half-price (due date 2014/12/31). There are also 50 percent time-limited snapping and free collar iPhone6, Ipadair and other good gifts!
Code performance--Inventory data structure design scheme