Arm Assembly instruction set Summary (1)

Source: Internet
Author: User

ARM core commands are becoming more and more powerful, and therefore consume less energy. Now they are more and more widely used in mobile devices. With the advent of the 3G era, various mobile devices have multimedia functions, and the more features and the better performance, many smartphones can play high-definition videos. In this case, we will summarize arm assembly instruction sets so that we can flexibly apply relevant commands.

Currently, the instruction sets mainly include armv4, armv5e, armv6, and neon.

Various arm cpu cores have their corresponding instruction sets. The model of the arm can be used to check arm company information (useful Infos can be found on the company homepage), such as ARM7TDMI and arm720t, ARM920T armv4 ISA; ARM7EJ-S, AMR926EJ-S, arm1020e, armv5e, ARM1036J-S, ARM1136J-S, armv6 Isa, Cortex-A8 and other use of neon. all arm cores have downward compatibility.

Based on armv4, This article summarizes the added parts of armv5e, armv6, and neon ISAs, and analyzes the performance improvements brought by the new commands.

The armv5e extension provides many new commands.

(1) The clz Rd, RM ---- zero count command is used to calculate the number of zeros between the highest sign bit and the first 1.

This command is useful in speech/audio codec normalization operations. For example, using armv4 ISA to implement forward zero count may use several commands.

(2) qadd Rd, RM, RN --- RD = SAT (RM + rn)

Qdadd Rd, RM, RN ---- RD = SAT (RM + rn <1)

Qsub Rd, RM, RN ---- RD = SAT (RM-Rn)

Qdsub Rd, RM, RN ---- RD = SAT (RM-Rn <1)

BecauseCodeIn the fixed point process, many operations are saturated. Using C or armv4 ISA to implement this function is very inefficient. However, you can use armv5 to take only one command.

(3) multiplication and accumulation commands

Armv5e in audioAlgorithmThe optimization effect is very good, mainly because 16bits * 16 bits and 32bits * 16bits multiplication and accumulation commands are newly added. However, many sources of audio data are 16 bits, and some filter coefs may be 32 bits. At this time, the armv5e command can exert its power well.

Smlaxy Rd, RM, RS, RN --- RD = (RM. x * Rs. Y) + rn

Smlalxy rdhi, rdlo, RM, RS --- [rdhi, rdlo] + = RM. x * Rs. Y

Smlawy Rd, RM, RS, RN ---- RD = (RM * Rs. Y)> 16) + rn

Smulxy Rd, RM, RS ---- RD = RM. x + Rs. Y

Smulwy Rd, RM, RS ---- RD = (RM * Rs. Y)> 16)

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.