A mathematical Library Based on Neon commands

Source: Internet
Author: User

This is an open-source library. It can only run on the javascortex-A architecture (supported by the neon command. According to the project introduction, because GCC does not support neon very well (it is estimated that the internal function efficiency of neon is not as good as that of assembly), the core computing code is written in inline assembly. If you want to compile and test, you can download the makefile written by the author (Address: http://gitorious.org/vjaquez-misc/math-neon/commit/14ba470caad37c33cf7245be69efc9a1366d8f99? Format = patch ).

I want to use in WinCE (Platform for cortex-a8 architecture), because the Code uses a lot of inline assembly, if you want to port to the wince platform, you need to rewrite the Assembly file or use the internal function function of the wec7 Compiler (see http://blog.csdn.net/alien75/article/details/8740641), both of them have a lot of work, the toolchain tool mingw32ce, which has not been used for a long time (see the http://blog.csdn.net/alien75/article/details/6998223), is taken into consideration because it only compiles the static library of the PE architecture, which can fully meet the needs, you only need to modify the makefile to perform normal compilation.

The content of the original makefile is as follows:

 

CFLAGS := -O2 -ggdb -mcpu=cortex-a9 -mfloat-abi=softfp -mfpu=neon -ansi -std=gnu99 -pedanticWARNINGS := -Wall -Wextra -Wno-unused-parameter -Wmissing-prototypesASSEMBLER := -Wa,-mimplicit-it=thumboverride CFLAGS += $(WARNINGS) $(ASSEMBLER)LIBS := -lmall: math_debuglibmathneon.a: math_acosf.o math_ldexpf.o math_powf.o math_sqrtfv.o \math_asinf.o math_expf.o math_log10f.o math_runfast.o math_tanf.o \math_atan2f.o  math_fabsf.o math_logf.o math_sincosf.o math_tanhf.o \math_atanf.o math_floorf.o math_mat2.o math_sinf.o math_vec2.o \math_ceilf.o math_fmodf.o math_mat3.o math_sinfv.o math_vec3.o \math_cosf.o math_frexpf.o math_mat4.o math_sinhf.o math_vec4.o \math_coshf.o math_invsqrtf.o math_modf.o math_sqrtf.omath_debug: math_debug.o libmathneon.a$(CC) $(LDFLAGS) -o $@ $^ $(LIBS)%.o:: %.c$(CC) $(CFLAGS) -o $@ -c $<%.a::$(AR) rcs $@ $^clean:$(RM) -v math_debug *.o *.a

Modified content

CC=arm-mingw32ce-gccAR=arm-mingw32ce-ar rcCFLAGS := -O2 -ggdb -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon -ansi -std=gnu99 -pedantic -DNO_ERRNO_H -D_WIN32_WCELDFLAGS := -L.WARNINGS := -Wall -Wextra -Wno-unused-parameter -Wmissing-prototypesASSEMBLER := -Wa,-mimplicit-it=thumboverride CFLAGS += $(WARNINGS) $(ASSEMBLER)#LIBS := -lmall: math_debuglibmathneon.a: math_acosf.o math_ldexpf.o math_powf.o math_sqrtfv.o \math_asinf.o math_expf.o math_log10f.o math_runfast.o math_tanf.o \math_atan2f.o  math_fabsf.o math_logf.o math_sincosf.o math_tanhf.o \math_atanf.o math_floorf.o math_mat2.o math_sinf.o math_vec2.o \math_ceilf.o math_fmodf.o math_mat3.o math_sinfv.o math_vec3.o \math_cosf.o math_frexpf.o math_mat4.o math_sinhf.o math_vec4.o \math_coshf.o math_invsqrtf.o math_modf.o math_sqrtf.omath_debug: math_debug.o libmathneon.a$(CC) $(LDFLAGS) -o $@ $^ $(LIBS)%.o:: %.c$(CC) $(CFLAGS) -o $@ -c $<%.a::$(AR) $@ $^clean:$(RM) -v math_debug *.o *.a

Test results (for comparison results of system functions, C language optimization functions, and neon assembler functions, see the number after rate)

RUNFAST: Enabled ------------------------------------------------------------------------------------------------------MATRIX FUNCTION TESTS ------------------------------------------------------------------------------------------------------matmul2_c = |2.66, -2.73||-5.74, -15.83|matmul2_neon = |2.66, -2.73||-5.74, -15.83|matmul2: c=112000  neon=65000  rate=1.72 matvec2_c = |2.66, -5.74|matvec2_neon = |2.66, -5.74|matvec2: c=66000  neon=53000  rate=1.25 matmul3_c =|-17.73, -8.39, -1.10||8.30, -5.32, 23.03||-5.67, -7.81, 9.07|matmul3_neon =|-17.73, -8.39, -1.10||8.30, -5.32, 23.03||-5.67, -7.81, 9.07|matmul3: c=394000  neon=120000  rate=3.28 matvec3_c = |-17.73, 8.30, -5.67|matvec3_neon = |-17.73, 8.30, -5.67|matvec3: c=66000  neon=53000  rate=1.25 matmul4_c =|-8.86, 8.70, -17.78, -7.64||-13.15, 20.92, -10.97, -14.02||17.37, -14.46, -13.16, 33.82||15.42, -27.32, -5.66, -6.37|matmul4_neon =|-8.86, 8.70, -17.78, -7.64||-13.15, 20.92, -10.97, -14.02||17.37, -14.46, -13.16, 33.82||15.42, -27.32, -5.66, -6.37|matmul4: c=991000  neon=141000  rate=7.03 matvec4_c = |-8.86, -13.15, 17.37, 15.418112|matvec4_neon = |-8.86, -13.15, 17.37, 15.418112|matvec4: c=66000  neon=53000  rate=1.25 dot2_c = 3.756326dot2_neon = 3.756326dot2: c=532000  neon=497000  rate=1.07 normalize2_c = [-0.74, -0.68]normalize2_neon = [-0.74, -0.68]normalize2: c=691000  neon=313000  rate=2.21 dot3_c = 3.698457dot3_neon = 3.698457dot3: c=572000  neon=514000  rate=1.11 normalize3_c = [-0.74, -0.68, -0.01]normalize3_neon = [-0.74, -0.68, -0.01]normalize3: c=806000  neon=353000  rate=2.28 cross3_c = [-4.69, 5.12, -1.46]cross3_neon = [-4.69, 5.12, -1.46]cross3: c=586000  neon=373000  rate=1.57 dot4_c = -4.564567dot4_neon = -4.564566dot4: c=625000  neon=487000  rate=1.28 normalize4_c = [-0.24, -0.22, -0.00, 0.95]normalize4_neon = [-0.24, -0.22, -0.00, 0.95]normalize4: c=924000  neon=343000  rate=2.69 ------------------------------------------------------------------------------------------------------CMATH FUNCTION TESTS ------------------------------------------------------------------------------------------------------FunctionRangeNumberABS Max ErrorREL Max ErrorRMS ErrorTimeRate------------------------------------------------------------------------------------------------------sinf       [-3.14, 3.14]5000000.00e+0000.00e+000%0.00e+000880000x1.00sinf_c     [-3.14, 3.14]5000008.34e-0071.00e+002%4.09e-007162000x5.43sinf_neon  [-3.14, 3.14]5000008.34e-0071.00e+002%4.09e-00796000x9.17cosf       [-3.14, 3.14]5000000.00e+0000.00e+000%0.00e+000906000x1.00cosf_c     [-3.14, 3.14]5000008.34e-0076.74e-001%4.16e-007192000x4.72cosf_neon  [-3.14, 3.14]5000001.41e+0006.64e+007%1.00e+000142000x6.38tanf       [-0.79, 0.79]5000000.00e+0000.00e+000%0.00e+0001140000x1.00tanf_c     [-0.79, 0.79]5000002.98e-0067.97e-004%1.31e-006200000x5.70tanf_neon  [-0.79, 0.79]5000001.91e-0063.62e-004%6.66e-007126000x9.05asinf      [-1.00, 1.00]5000000.00e+0000.00e+000%0.00e+0002732000x1.00asinf_c    [-1.00, 1.00]5000005.53e-0051.06e-002%1.69e-005277000x9.86asinf_neon [-1.00, 1.00]5000004.65e-0058.87e-003%1.#Re+000151000x18.09acosf      [-1.00, 1.00]5000000.00e+0000.00e+000%0.00e+0002670000x1.00acosf_c    [-1.00, 1.00]5000005.56e-0056.46e-003%1.69e-005312000x8.56acosf_neon [-1.00, 1.00]5000004.67e-0056.35e-003%1.#Re+000171000x15.61atanf      [-1.00, 1.00]5000000.00e+0000.00e+000%0.00e+0001021000x1.00atanf_c    [-1.00, 1.00]5000001.67e-0042.12e-002%7.40e-005198000x5.16atanf_neon [-1.00, 1.00]5000001.67e-0042.12e-002%7.40e-005121000x8.44sinhf       [-3.14, 3.14]5000000.00e+0000.00e+000%0.00e+0001509000x1.00sinhf_c     [-3.14, 3.14]5000001.91e-0061.52e-001%2.37e-007280000x5.39sinhf_neon  [-3.14, 3.14]5000001.91e-0061.52e-001%1.90e-007108000x13.97coshf       [-3.14, 3.14]5000000.00e+0000.00e+000%0.00e+0001163000x1.00coshf_c     [-3.14, 3.14]5000001.91e-0062.37e-005%2.28e-007283000x4.11coshf_neon  [-3.14, 3.14]5000001.91e-0062.22e-005%1.68e-007108000x10.77tanhf       [-3.14, 3.14]5000000.00e+0000.00e+000%0.00e+0001555000x1.00tanhf_c     [-3.14, 3.14]5000001.21e-0052.48e-001%5.48e-006235000x6.62tanhf_neon  [-3.14, 3.14]5000002.38e-0072.47e-001%5.40e-008119000x13.07expf       [0.00, 10.00]5000000.00e+0000.00e+000%0.00e+000960000x1.00expf_c     [0.00, 10.00]5000009.77e-0036.58e-005%1.64e-003132000x7.27expf_neon  [0.00, 10.00]5000009.77e-0036.58e-005%1.64e-00388000x10.91logf       [1.00, 1000.00]5000000.00e+0000.00e+000%0.00e+0001027000x1.00logf_c     [1.00, 1000.00]5000007.63e-0061.03e-002%1.07e-006116000x8.85logf_neon  [1.00, 1000.00]5000007.63e-0061.03e-002%1.07e-00682000x12.52log10f       [1.00, 1000.00]5000000.00e+0000.00e+000%0.00e+0001202000x1.00log10f_c     [1.00, 1000.00]5000003.34e-0066.68e-003%4.84e-007116000x10.36log10f_neon  [1.00, 1000.00]5000003.34e-0066.68e-003%4.84e-00781000x14.84floorf     [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+0004705000x1.00floorf_c   [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000819000x5.74floorf_neon[1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000671000x7.01ceilf     [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+0005734000x1.00ceilf_c   [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000814000x7.04ceilf_neon[1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000696000x8.24fabsf     [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+0002005000x1.00fabsf_c   [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000455000x4.41fabsf_neon[1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000446000x4.50sqrtf      [1.00, 1000.00]5000000.00e+0000.00e+000%0.00e+0003222000x1.00sqrtf_c    [1.00, 1000.00]5000002.33e-0041.06e-003%8.69e-005139000x23.18sqrtf_neon [1.00, 1000.00]5000007.63e-0062.91e-005%1.60e-00685000x37.91invsqrtf      [1.00, 1000.00]5000000.00e+0000.00e+000%0.00e+000106000x1.00invsqrtf_c    [1.00, 1000.00]5000004.35e-0064.78e-004%2.00e-00794000x1.13invsqrtf_neon [1.00, 1000.00]5000001.19e-0072.12e-005%4.81e-00970000x1.51atan2f       [0.10, 10.00]100000.00e+0000.00e+000%0.00e+0002388000x1.00atan2f_c     [0.10, 10.00]100001.73e-0042.23e-002%0.00e+000657000x3.63atan2f_neon  [0.10, 10.00]100001.67e-0042.12e-002%0.00e+000278000x8.59powf       [1.00, 10.00]100000.00e+0000.00e+000%0.00e+0008316000x1.00powf_c     [1.00, 10.00]100001.36e+0055.88e-003%0.00e+000493000x16.87powf_neon  [1.00, 10.00]100001.36e+0055.88e-003%0.00e+000292000x28.48fmodf       [1.00, 10.00]100000.00e+0000.00e+000%0.00e+0001394000x1.00fmodf_c     [1.00, 10.00]100009.99e+0008.06e-002%0.00e+000341000x4.09fmodf_neon  [1.00, 10.00]100009.97e+0008.06e-002%0.00e+000238000x5.86

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.