Description of gather features in avx2 in Intel processor architecture

Source: Internet
Author: User
Tags intel core i7

Introduced the gather feature in the Intel haswell architecture. This allows the CPU to use vector index memory addressing to retrieve discontinuous data elements from the memory. These gather commands introduce a new form of memory addressing, which consists of a base address register (still a general purpose register) and a vector register (XMM or ymm) composed of multiple specified indexes. The data element size can be 32-bit or 64-bit, and the data type can be float or integer.


Let's first review the general x86 addressing method: [<base register> + <index register> * <scale> + <OFFSET>]

In the form of at&t, the following table is displayed: <OFFSET> (<base register>, <index register>, <scale>)

Here, <base register> is the base address register; <index register> is the index register; <scale> is the scale factor, which is an immediate number and only supports 0, 1, 2, 4, 8. <OFFSET> indicates the offset, which is an immediate number.

Next, let's talk about the addressing of vector memory mentioned above.


Vector SIB (vsib) memory addressing


In avx2, the sib (s) following the modr/M bytes represent scale; I represents index; B Represents base) bytes can support vsib memory addressing for a set of linear addresses. Vsib addressing is only supported in avx2 sub-sets. Vsib memory addressing requires 32-bit or 64-bit effective addressing. In 32-Bit mode, vsib addressing is not supported when the address size attribute is overloaded to 16-bit. In 16-bit protection mode, vsib addressing is allowed. If the address size attribute is overloaded to 32-bit. In addition, vsib memory addressing is supported only with the Vex prefix.

In vsib memory addressing, the sib byte consists of the following parts:

● The scale field () specifies the scale factor.

● The index field (5: 3) specifies the Register number of the vector index register. Each element in the vector memory specifies an index.

● The base address field (bit: 2: 0) specifies the number of the base address register.

For example:

vgatherdpd    %xmm0, 128(%rdi, %xmm2, 4), %xmm3

In the preceding command, the base address register is RDI, the index register is xmm2, the scale factor is 4, and the offset is 128. The command vgatherdpd divides the elements of the index register as dual characters (4 bytes), and then adds them to the base address by multiplying the scale factor. The offset acts on each base address element.


The following provides a complete example code to describe the vgatherdpd command.

First look at the Assembly command:

_ Insttest: // set each element of the index register mov $4, % eax // The previous index is 4 movd % eax, % xmm2 mov $8, % eax // The last index is 8 pinsrd $1, % eax, % xmm2 // set the masks of both double elements to 1 mov $0 xffffffffffffffff, % Rax movq % rax, % xmm0 punpcklqdq % xmm0, % xmm0 vgatherdpd % xmm0, 8 (% RDI, % xmm2, 2), % xmm3 RET

Here, the RDI register is used as the first input parameter and stores the base address.

The following is a C function call:

int main(void){        extern void InstTest(void *p);        unsigned __attribute__((aligned(64))) buffer[] = { 0x01020304, 0x05060708, 0x090a0b0c, 0x10121314, 0x15161718, 0x191a1b1c, 0x20212223, 0x24252627 };        InstTest(buffer);    return 0;}

We set a breakpoint in the return 0; statement, and then use the lldb debugger to find that the last content of the xmm3 register is:

Xmm3 = {0x18 0x17 0x16 0x15 0x1c 0x1b 0x1a 0x19 0x23 0x22 0x21 0x20 0x27 0x26 0x25 0x24}

.


The compiling environment of the above Code is OS X 10.9.3, xcode 5.1, and Apple llvm 5.1.

Running Environment: MacBook Air 2013, Intel core i7 4650u, 8 GB ddr3.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.