ARM Floating point Operation detailed

Source: Internet
Author: User
One: Floating-point simulators on early arm:

Early arm does not have a coprocessor, so floating-point operations are simulated by the CPU, where floating-point operations are performed on floating-point arithmetic simulators (float math emulation), and floating-point operations, which require thousands of loops to perform, are particularly slow.

Until today, when arm kernel is configured, the following options are available:

floating point emulation--->

[] NWFPE math emulation

[] FASTFPE math emulation (experimental)

Here, you can configure the arm floating-point emulator.

Floating-point simulator simulation floating point is the use of undefined instrction handler, in the operation of the floating-point calculation is encountered in the abnormal interruption, the result of this is to bring a very frequent exception, greatly increase the interruption delay, reduce the real-time performance of the system.

Second: Soft floating point technology:

Soft floating-point support is a function provided by the cross tool chain, regardless of the Linux kernel. When a floating-point operation is compiled with a soft floating-point tool chain, the compiler replaces the floating-point operation with an inline floating-point library so that the generated machine code does not contain floating-point instructions at all, but is able to complete the correct floating-point operation.

Three: Floating point coprocessor:

In newer versions of ARM, you can add a coprocessor. Some arm CPUs added a floating-point coprocessor in order to better handle the need for floating-point computations.

and defines a set of floating-point instructions. If no actual hardware exists, the instructions are intercepted and executed by the Floating-point emulator module (fpemulator).

Four: Hardware floating-point coprocessor and the use of the corresponding instruction set:

You want to use a hardware floating-point coprocessor to help with floating-point operations in the application. Several prerequisites are required:

1. Kernel support hardware coprocessor.

2. The compiler supports translating floating-point operations into hardware floating-point operations instructions, or manually invoking the corresponding floating-point operation instructions when a floating-point operation is required.

1. Support from Kernle:

If the kernel does not support floating-point coprocessor, the coprocessor corresponding instruction cannot run because of problems such as the use of a federation processor register.

A master on the network pointed out:

CP15 C1 coprocessor access control registers, which specify user mode and privileged access to the coprocessor. We want to use VFP of course to run user mode to access CP10 and CP11.
Another register is VFP's Fpexc Bit30 This is the use bit of VFP function.
In fact, the operating system has done these two things, the user program can be used VFP. Of course, Kernel has dealt with other things in addition to these 2 events.

Floating point Emulation--->
[*] Vfp-format floating point Maths

Include VFP Support code in the kernel. This is needed IF your hardware includes a VFP unit.

2. Compiler specifies floating-point directives:

The compiler can explicitly specify which floating-point instructions to translate floating-point operations into.

If the compiler supports soft floating-point, it may translate floating-point operations into its own floating-point libraries in the compiler. There is no real floating-point operation.

Otherwise, it can be translated into FPA (floating point accelerator) instructions. FPA instructions to see if there is a floating-point emulator.

You can also specify floating-point operations as VFP (vector floating point) directives or neon vector floating-point instructions.

five. Compiler specifies to compile hard floating-point instructions:

Test the length of time for operations such as floating-point subtraction:

Float src_mem_32[1024] = {1.024};


Float dst_mem_32[1024] = {0.933};

for (j = 0; J < 1024; J + +)
{
for (i = 0; i < 1024; i++)
{
Src_32 = Src_mem_32[i] + dst_mem_32[i];
}
}

The computational power is calculated by printf the difference between the millisecond numbers.

Compile:

arm-hisiv200-linux-gcc-c-wall fcpu.c-o fcpu.o

ARM-HISIV200-LINUX-GCC fcpu.o-o fcpu-l./

Run, you get 32-bit floating-point numbers plus 1024 times the time required.

If you want to use VFP it.

arm-hisiv200-linux-gcc-c-wall-mfpu=vfp-mfloat-abi=softfp fcpu.c-o fcpu.o

ARM-HISIV200-LINUX-GCC-WALL-MFPU=VFP-MFLOAT-ABI=SOFTFP fcpu.o-o fcpu-l./

After running, it was found that the time required was almost half reduced. The explanation is also very effective.

for-mfpu-mfloat-abi Explanation: see Appendix 2.

In addition, how can the intuitive check out whether to use VFP it.

You can get a conclusion by looking at the compiled ASM program.

#arm-hisiv200-linux-objdump-d FCPU.O

00000000 <test_f32bit_addition>:
0:e52db004 push {FP}; (str fp, [SP, #-4]!)
4:e28db000 add FP, SP, #0
8:e24dd00c Sub sp, SP, #12
C:E3A03000 mov R3, #0
10:e50b300c STR R3, [FP, #-12]
14:E3A03000 mov R3, #0
18:e50b3008 STR R3, [FP, #-8]
1C:E3A03000 mov R3, #0
20:e50b3008 STR R3, [FP, #-8]
24:ea000017 b <test_F32bit_addition+0x88>
28:E3A03000 mov R3, #0
2c:e50b300c STR R3, [FP, #-12]
30:ea00000d B 6c <test_F32bit_addition+0x6c>
34:e51b200c LDR R2, [FP, #-12]
38:e59f3064 LDR R3, [PC, #100]; A4 <test_F32bit_addition+0xa4>
3c:e0831102 add R1, R3, R2, LSL #2
40:ed917a00VldrS14, [R1]
44:e51b200c LDR R2, [FP, #-12]
48:e59f3058 LDR R3, [PC, #88]; A8 <test_F32bit_addition+0xa8>
4c:e0831102 add R1, R3, R2, LSL #2
50:edd17a00VldrS15, [R1]
54:ee777a27Vadd. F32 S15, S14, S15
58:E59F304C LDR R3, [PC, #76]; AC <test_F32bit_addition+0xac>
5c:edc37a00VstrS15, [R3]
60:e51b300c LDR R3, [FP, #-12]
64:e2833001 add R3, R3, #1
68:e50b300c STR R3, [FP, #-12]
6c:e51b200c LDR R2, [FP, #-12]
70:e59f3038 LDR R3, [PC, #56]; B0 <test_F32bit_addition+0xb0>
74:E1520003 CMP R2, R3
78:daffffed ble <test_F32bit_addition+0x34>
7c:e51b3008 LDR R3, [FP, #-8]
80:e2833001 add R3, R3, #1
84:e50b3008 STR R3, [FP, #-8]
88:e51b2008 LDR R2, [FP, #-8]
8c:e59f301c LDR R3, [PC, #28]; B0 <test_F32bit_addition+0xb0>
90:E1520003 CMP R2, R3
94:daffffe3 ble <test_F32bit_addition+0x28>
98:e28bd000 Add SP, FP, #0
9c:e49db004 pop {FP}; (Ldr FP, [sp], #4)
A0:E12FFF1E BX LR

This clearly contains VFP instructions. Therefore, the use of VFP instructions:

arm-hisiv200-linux-gcc-c-wall-mfpu=vfp-mfloat-abi=softfp fcpu.c-o fcpu.o

Note: The VFP instruction instruction is in Appendix 1.

If you use:

arm-hisiv200-linux-gcc-c-wall fcpu.c-o fcpu.o

#arm-hisiv200-linux-objdump-d FCPU.O

00000000 <test_f32bit_addition>:
0:e92d4800 Push {FP, LR}
4:e28db004 add FP, SP, #4
8:e24dd008 Sub sp, SP, #8
C:E3A03000 mov R3, #0
10:e50b300c STR R3, [FP, #-12]
14:E3A03000 mov R3, #0
18:e50b3008 STR R3, [FP, #-8]
1C:E3A03000 mov R3, #0
20:e50b3008 STR R3, [FP, #-8]
24:ea000019 b <test_F32bit_addition+0x90>
28:E3A03000 mov R3, #0
2c:e50b300c STR R3, [FP, #-12]
30:EA00000F b <test_F32bit_addition+0x74>
34:e51b200c LDR R2, [FP, #-12]
38:e59f3068 LDR R3, [PC, #104]; A8 <test_F32bit_addition+0xa8>
3c:e7932102 LDR R2, [R3, R2, LSL #2]
40:e51b100c LDR R1, [FP, #-12]
44:e59f3060 LDR R3, [PC, #96]; AC <test_F32bit_addition+0xac>
48:e7933101 LDR R3, [R3, R1, LSL #2]
4C:E1A00002 mov r0, r2
50:E1A01003 mov r1, R3
54:ebfffffeBL 0 <__aeabi_fadd>
58:E1A03000 mov R3, r0
5C:E1A02003 mov R2, R3
60:e59f3048 LDR R3, [PC, #72]; B0 <test_F32bit_addition+0xb0>
64:e5832000 str R2, [R3]
68:e51b300c LDR R3, [FP, #-12]
6c:e2833001 add R3, R3, #1
70:e50b300c STR R3, [FP, #-12]
74:e51b200c LDR R2, [FP, #-12]
78:e59f3034 LDR R3, [PC, #52]; B4 <test_F32bit_addition+0xb4>
7C:E1520003 CMP R2, R3
80:daffffeb ble <test_F32bit_addition+0x34>
84:e51b3008 LDR R3, [FP, #-8]
88:e2833001 add R3, R3, #1
8c:e50b3008 STR R3, [FP, #-8]
90:e51b2008 LDR R2, [FP, #-8]
94:e59f3018 LDR R3, [PC, #24]; B4 <test_F32bit_addition+0xb4>
98:E1520003 CMP R2, R3
9c:daffffe1 ble <test_F32bit_addition+0x28>
a0:e24bd004 Sub sp, FP, #4
a4:e8bd8800 Pop {fp, PC}

The VFP instruction is not included.

and call __aeabi_fadd.

Appendix 1:VFP directives

You can view arm's RealView documentation.

Http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204ic/Bcffbdga.html

Appendix 2:

-mfpu=name
-mfpe=number
-mfp=number

This specifies what floating point hardware (or hardware emulation) are available on the target.  Permissible names ARE:FPA, Fpe2, Fpe3, Maverick, VFP. -MFP and-mfpe are synonyms for-mfpu=fpenumber, for compatibility with older versions of GCC.

-mfloat-abi=name
Specifies which ABI to use for floating point values. Permissible values Are:soft, SOFTFP and hard.

Soft and hard are equivalent to-msoft-float and-mhard-float. SOFTFP allows the generation of floating point instructions, but still the uses the soft-float calling.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.