Profile:
With the release of the Intel Pentium III processor, many new features have been brought to the program designers. With these new features, programmers can create better products for users. Many of the new features of Pentium III and Pentium III Xeon (Xeon processors) enable her to run faster than the Pentium II and Pentium II Xeon processors, which include a processor serial number (unique Processor ID) and the Add SSE processor instruction set, these new instruction sets are like the MMX instruction set added by Pentium II on the basis of classic Pentium.
1. Data swizzling
The acceleration of the Pentium III processor SSE directive is also costly. Because the SSE directive can only manipulate new data types that she defines (128-bit). If your application uses its own data type format, convert it to this new data type before you perform the SSE instruction operation. After the operation is done, he must be converted back.
The operation of converting one data format to another is called the "Data Swizzling".
This conversion takes time and consumes the core cycle of the processor. If an application is frequently converted to data format, the waste of the processor core cycle is serious. Therefore, the conversion of this data format must be paid attention to.
1.1 Data organization
Typically, a 3D application holds a vertex in a matching data structure. When multiple vertices are expressed, the application uses an array of this structure, also called an AOS, to represent. A typical operation is to represent the vertices of X, Y and Z coordinates. The following code gives a data structure that represents a 3D vertex. If you want to represent a large number of such vertices, you need to use an array of this structure, as shown in Figure 9.
struct point {
float x, y, z;
};
...
point dataset[...];
Figure Nine: Structure array
The advantage of SSE is that multiple vertices can be processed at the same time. So we have to be able to easily handle data that represents multiple vertices (for example, 4 floating-point numbers representing the x coordinates of 4 vertices). This is achievable, and we can assemble the X, Y, and z three coordinate values representing a vertex, The application then processes them. To implement these, the application must rearrange the data into three separate arrays, or create an array structure in which each array corresponds to an array of coordinate values. This data structure is also called an SOA structure. (I understand this: the three coordinates representing a vertex are grouped into a data structure when no SSE is used. The process of a value of one value. After the use of SSE, the coordinates of all points can be combined into 3 arrays, the processing of such an array out of the 4 values of the simultaneous execution.
The following code defines an array structure such as Figure 10, which is represented by a chart.
struct point {
float *x, *y, *z;
};
Figure 10: Array structure