One of Intel's major pitfalls: The Lost SSE2 128bit displacement command, Ma Hang MH370 ??

Last Update:2014-06-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reason

Recently, I was writing some optimizations to string functions, but I was interested in it. However, I encountered a big pitfall when I wanted to implement-bit logical shift.

Logical displacement

We naturally think of the MMX and SSE displacement commands:

Logical Left Shift: PSLLW/PSLLD/PSLLQ, Shift Packed Data Left Logical (compression logic Left Shift) Logical Right Shift: PSRLW/PSRLD/psrscsi, Shift Packed Data Right Logical (compression logic Right Shift)

As the name suggests, W refers to Word, D Refers to DWORD, and Q refers to QWORD ), PSLLW implements left shift by Word grouping logic,

PSLLD shifts left by the grouping logic of DWORD, and PSLLQ shifts left by the grouping logic implemented by QWORD, all of which seem OK.

Here, the logical left shift is used as an example:

For details about the logic left shift instruction, refer:

Http://moeto.comoj.com/project/intel/instruct32_hh/vc256.htm

Or http://x86.renejeschke.de/html/file_module_x86_id_259.html,

The right shift is similar, so we will not describe it here.

The problem arises.

What we need to implement is the logical shift of bits. SSE2 contains the PSLLDQ command. The DQ here is the meaning of Double QWORD,

Isn't this exactly the-bit shift we need? No! Let's take a look at Intel's documents:

PSLLDQ -- Packed Shift Left Logical Double Quadword

Http://moeto.comoj.com/project/intel/instruct32_hh/vc255.htm

As follows:

We can see that, unfortunately, SSE2 does not achieve-bit shift by bit. PSLLDQ can only achieve-bit shift by byte, that is, the minimum displacement must be one byte (eight bits ), this is very unscientific. Considering that Intel does not actually implement-Bit Data Processing (most SSE commands only implement a maximum of 64-bit Granularity Data Processing, for example, a double-precision floating point number is 64-bit), okay, we recognize it, !! But !! Intel, aren't you mistaken? PSLLDQ only supports imm8 operations. What does imm8 mean? Imm8 refers to the 8-bit immediate number, which means that we can only write dead (constants) in the Assembly and cannot use any registers for displacement. What the fu * K ??

Okay, so do we... You designed the CPU. We can't help you. If PSLLDQ supports reg32 and reg64 register displacement, it will be much more convenient, because we can first use PSLLDQ to shift the Byte displacement by enough digits, and then use PSLLQ to shift the remaining amount (this is the latter, why do we need to use this, you will know later), but this method is not feasible now !! This imm8 completely broke my eggs... PSLLQ can only shift 16 bits at a time for the 128 bit register (break through the slave). This means that if we use this method, we need to use if/jump several times...

Big pitfall begins

Well, let's go back to the next step. Since you cannot implement 128-bit shift by bit, we can divide it into two 64-bit shifts to achieve this. It is nothing more than one judgment, if you merge multiple times, although the efficiency is not as high as 128-bit, you have to do this...
Okay, let's get started .... GO !!! Now we have changed to PSLLQ. Run PSLLQ xmm0, 32 or PSLLQ xmm0, ecx (here the ecx value is 32), then? Why is xmm0 0 all zero ?? Ah, what's going on ??

Let's look back at intel's documents again:

The focus is on the two redlines. When PSLLQ acts on 64-bit registers, we can see that it supports the maximum COUNT = 64-bit displacement (strictly speaking, it is max = 63, this is a habit problem );

However, when PSLLQ acts on a 128-bit register, a strange thing happens. The maximum displacement is COUNT = 16 bits (15 bits in a strict sense), as shown in.

If I didn't re-read Intel's documents, but did not find any problems during debugging, who could think of moving at most 15 bits ??? Is Intel's head in the door ?? Why ?? On the MMX registers, a maximum of 63-bit displacement can be achieved. Why cannot the SSE register be used? Although we know that MMX registers and SSE registers are different and separate, MMX registers use x87 floating-point registers to implement MMX instructions, however, you have implemented 64-bit displacement in the MMX register. Why can only a maximum of 15 characters be moved in the 128-bit SSE register ?? You said it was hard to implement. I recognized it. I don't know why it was so difficult. We can only recognize it, but you implemented the 128-bit PSLLDQ Command Based on byte displacement, what is the explanation ?? Originally, as the name implies, PSLLDQ should be able to achieve a 128-bit shift by bit. due to historical reasons, I can understand this problem, however, you have no reason for PSLLQ to act on a 128-bit SSE register, but you can only shift at most 15 bits, right ?? Is this really so difficult ?? Is it really hard ???? It's really so difficult. How do you implement the 128-bit paybyte displacement of PSLLDQ ??

Seek answers

With these questions, we asked Mr. Google to search for "128-bit shift" and found that N's friends had encountered this problem, for example:

Looking for sse 128 bit shift operation for non-immediate shift value

What is SSE! @ # $ % Good? #2: Bit vector operations

Finally, Mr. Google told us the best answer, from Intel's forum, here:

Missing instruction in SSE: PSLLDQ with _ bit _ shift amount?

Yes, as follows:

Solution

Dddd

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

One of Intel's major pitfalls: The Lost SSE2 128bit displacement command, Ma Hang MH370 ??

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

One of Intel's major pitfalls: The Lost SSE2 128bit displacement command, Ma Hang MH370 ??

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support