Introduction to SSE instruction set-based programming

Source: Internet
Author: User

SSE technology Overview
Intel's single-instruction, multi-data stream extension (SSE, Streaming SIMD Extensions) technology can effectively enhance the capabilities of CPU floating point operations. Visual Studio. NET 2003 provides support for SSE instruction set programming, allowing you to directly use SSE commands without writing assembly code in C ++ code. The topic of SSE Technology in MSDN [1] may confuse beginners who are not familiar with SSE assembly instruction programming. However, while reading the relevant documentation on MSDN, refer to Intel Software manuals [2] to give you a clearer understanding of the key points of using SSE instruction programming.
SIMD (single-instruction, multiple-data) is a CPU execution mode that uses a single command to process multiple data streams, that is, you can use one command to process multiple data within a CPU instruction execution cycle. Consider the following task: Calculate the square root of each element in a very long floating-point array. The algorithm for implementing this task can be written as follows:
For each f in array // for each element in the array
F = sqrt (f) // calculate its square root
To understand the implementation details, we write the above Code as follows:
For each f in array
{
Load f from memory to floating point register
Calculate the square root
Then, extract the calculation result from the register and put it into the memory.
}
The processor with Intel SSE instruction set support has 8 128-bit registers, each of which can store 4 (32-bit) Single-precision floating point numbers. SSE also provides an instruction set in which commands allow floating point numbers to be loaded into these 128-bit registers, which allow arithmetic logic operations in these registers, then, return the result to the memory. After SSE technology is used, the algorithm can be written as follows:
For each 4 members in array // for each 4 elements in the array
{
Load the four numbers in the array to a 128-bit SSE register.
Perform operations to calculate the square root of the four numbers in a CPU instruction execution cycle.
Extract the four results and write them into the memory.
}
C ++ programmers do not have to worry about these 128-bit registers when using SSE instruction function programming, you can use the 128-Bit Data Type "_ m128" and a series of C ++ functions to implement these arithmetic and logical operations, it determines which SSE register the program uses and code optimization is the task of the C ++ compiler. When you need to process elements in a long floating-point array, SSE technology is indeed a very efficient method.
SSE program design details
Included header files:
All SSE instruction functions and _ m128 data types are defined in xmmintrin. h:
# Include <xmmintrin. h>
Because the SSE processor commands used in the program are determined by the compiler, there is no related. lib library file.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.