A mathematical Library Based on Neon commands

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is an open-source library. It can only run on the javascortex-A architecture (supported by the neon command. According to the project introduction, because GCC does not support neon very well (it is estimated that the internal function efficiency of neon is not as good as that of assembly), the core computing code is written in inline assembly. If you want to compile and test, you can download the makefile written by the author (Address: http://gitorious.org/vjaquez-misc/math-neon/commit/14ba470caad37c33cf7245be69efc9a1366d8f99? Format = patch ).

I want to use in WinCE (Platform for cortex-a8 architecture), because the Code uses a lot of inline assembly, if you want to port to the wince platform, you need to rewrite the Assembly file or use the internal function function of the wec7 Compiler (see http://blog.csdn.net/alien75/article/details/8740641), both of them have a lot of work, the toolchain tool mingw32ce, which has not been used for a long time (see the http://blog.csdn.net/alien75/article/details/6998223), is taken into consideration because it only compiles the static library of the PE architecture, which can fully meet the needs, you only need to modify the makefile to perform normal compilation.

The content of the original makefile is as follows:

CFLAGS := -O2 -ggdb -mcpu=cortex-a9 -mfloat-abi=softfp -mfpu=neon -ansi -std=gnu99 -pedanticWARNINGS := -Wall -Wextra -Wno-unused-parameter -Wmissing-prototypesASSEMBLER := -Wa,-mimplicit-it=thumboverride CFLAGS += $(WARNINGS) $(ASSEMBLER)LIBS := -lmall: math_debuglibmathneon.a: math_acosf.o math_ldexpf.o math_powf.o math_sqrtfv.o \math_asinf.o math_expf.o math_log10f.o math_runfast.o math_tanf.o \math_atan2f.o  math_fabsf.o math_logf.o math_sincosf.o math_tanhf.o \math_atanf.o math_floorf.o math_mat2.o math_sinf.o math_vec2.o \math_ceilf.o math_fmodf.o math_mat3.o math_sinfv.o math_vec3.o \math_cosf.o math_frexpf.o math_mat4.o math_sinhf.o math_vec4.o \math_coshf.o math_invsqrtf.o math_modf.o math_sqrtf.omath_debug: math_debug.o libmathneon.a$(CC) $(LDFLAGS) -o $@ $^ $(LIBS)%.o:: %.c$(CC) $(CFLAGS) -o $@ -c $<%.a::$(AR) rcs $@ $^clean:$(RM) -v math_debug *.o *.a

Modified content

CC=arm-mingw32ce-gccAR=arm-mingw32ce-ar rcCFLAGS := -O2 -ggdb -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon -ansi -std=gnu99 -pedantic -DNO_ERRNO_H -D_WIN32_WCELDFLAGS := -L.WARNINGS := -Wall -Wextra -Wno-unused-parameter -Wmissing-prototypesASSEMBLER := -Wa,-mimplicit-it=thumboverride CFLAGS += $(WARNINGS) $(ASSEMBLER)#LIBS := -lmall: math_debuglibmathneon.a: math_acosf.o math_ldexpf.o math_powf.o math_sqrtfv.o \math_asinf.o math_expf.o math_log10f.o math_runfast.o math_tanf.o \math_atan2f.o  math_fabsf.o math_logf.o math_sincosf.o math_tanhf.o \math_atanf.o math_floorf.o math_mat2.o math_sinf.o math_vec2.o \math_ceilf.o math_fmodf.o math_mat3.o math_sinfv.o math_vec3.o \math_cosf.o math_frexpf.o math_mat4.o math_sinhf.o math_vec4.o \math_coshf.o math_invsqrtf.o math_modf.o math_sqrtf.omath_debug: math_debug.o libmathneon.a$(CC) $(LDFLAGS) -o $@ $^ $(LIBS)%.o:: %.c$(CC) $(CFLAGS) -o $@ -c $<%.a::$(AR) $@ $^clean:$(RM) -v math_debug *.o *.a

Test results (for comparison results of system functions, C language optimization functions, and neon assembler functions, see the number after rate)

RUNFAST: Enabled ------------------------------------------------------------------------------------------------------MATRIX FUNCTION TESTS ------------------------------------------------------------------------------------------------------matmul2_c = |2.66, -2.73||-5.74, -15.83|matmul2_neon = |2.66, -2.73||-5.74, -15.83|matmul2: c=112000  neon=65000  rate=1.72 matvec2_c = |2.66, -5.74|matvec2_neon = |2.66, -5.74|matvec2: c=66000  neon=53000  rate=1.25 matmul3_c =|-17.73, -8.39, -1.10||8.30, -5.32, 23.03||-5.67, -7.81, 9.07|matmul3_neon =|-17.73, -8.39, -1.10||8.30, -5.32, 23.03||-5.67, -7.81, 9.07|matmul3: c=394000  neon=120000  rate=3.28 matvec3_c = |-17.73, 8.30, -5.67|matvec3_neon = |-17.73, 8.30, -5.67|matvec3: c=66000  neon=53000  rate=1.25 matmul4_c =|-8.86, 8.70, -17.78, -7.64||-13.15, 20.92, -10.97, -14.02||17.37, -14.46, -13.16, 33.82||15.42, -27.32, -5.66, -6.37|matmul4_neon =|-8.86, 8.70, -17.78, -7.64||-13.15, 20.92, -10.97, -14.02||17.37, -14.46, -13.16, 33.82||15.42, -27.32, -5.66, -6.37|matmul4: c=991000  neon=141000  rate=7.03 matvec4_c = |-8.86, -13.15, 17.37, 15.418112|matvec4_neon = |-8.86, -13.15, 17.37, 15.418112|matvec4: c=66000  neon=53000  rate=1.25 dot2_c = 3.756326dot2_neon = 3.756326dot2: c=532000  neon=497000  rate=1.07 normalize2_c = [-0.74, -0.68]normalize2_neon = [-0.74, -0.68]normalize2: c=691000  neon=313000  rate=2.21 dot3_c = 3.698457dot3_neon = 3.698457dot3: c=572000  neon=514000  rate=1.11 normalize3_c = [-0.74, -0.68, -0.01]normalize3_neon = [-0.74, -0.68, -0.01]normalize3: c=806000  neon=353000  rate=2.28 cross3_c = [-4.69, 5.12, -1.46]cross3_neon = [-4.69, 5.12, -1.46]cross3: c=586000  neon=373000  rate=1.57 dot4_c = -4.564567dot4_neon = -4.564566dot4: c=625000  neon=487000  rate=1.28 normalize4_c = [-0.24, -0.22, -0.00, 0.95]normalize4_neon = [-0.24, -0.22, -0.00, 0.95]normalize4: c=924000  neon=343000  rate=2.69 ------------------------------------------------------------------------------------------------------CMATH FUNCTION TESTS ------------------------------------------------------------------------------------------------------FunctionRangeNumberABS Max ErrorREL Max ErrorRMS ErrorTimeRate------------------------------------------------------------------------------------------------------sinf       [-3.14, 3.14]5000000.00e+0000.00e+000%0.00e+000880000x1.00sinf_c     [-3.14, 3.14]5000008.34e-0071.00e+002%4.09e-007162000x5.43sinf_neon  [-3.14, 3.14]5000008.34e-0071.00e+002%4.09e-00796000x9.17cosf       [-3.14, 3.14]5000000.00e+0000.00e+000%0.00e+000906000x1.00cosf_c     [-3.14, 3.14]5000008.34e-0076.74e-001%4.16e-007192000x4.72cosf_neon  [-3.14, 3.14]5000001.41e+0006.64e+007%1.00e+000142000x6.38tanf       [-0.79, 0.79]5000000.00e+0000.00e+000%0.00e+0001140000x1.00tanf_c     [-0.79, 0.79]5000002.98e-0067.97e-004%1.31e-006200000x5.70tanf_neon  [-0.79, 0.79]5000001.91e-0063.62e-004%6.66e-007126000x9.05asinf      [-1.00, 1.00]5000000.00e+0000.00e+000%0.00e+0002732000x1.00asinf_c    [-1.00, 1.00]5000005.53e-0051.06e-002%1.69e-005277000x9.86asinf_neon [-1.00, 1.00]5000004.65e-0058.87e-003%1.#Re+000151000x18.09acosf      [-1.00, 1.00]5000000.00e+0000.00e+000%0.00e+0002670000x1.00acosf_c    [-1.00, 1.00]5000005.56e-0056.46e-003%1.69e-005312000x8.56acosf_neon [-1.00, 1.00]5000004.67e-0056.35e-003%1.#Re+000171000x15.61atanf      [-1.00, 1.00]5000000.00e+0000.00e+000%0.00e+0001021000x1.00atanf_c    [-1.00, 1.00]5000001.67e-0042.12e-002%7.40e-005198000x5.16atanf_neon [-1.00, 1.00]5000001.67e-0042.12e-002%7.40e-005121000x8.44sinhf       [-3.14, 3.14]5000000.00e+0000.00e+000%0.00e+0001509000x1.00sinhf_c     [-3.14, 3.14]5000001.91e-0061.52e-001%2.37e-007280000x5.39sinhf_neon  [-3.14, 3.14]5000001.91e-0061.52e-001%1.90e-007108000x13.97coshf       [-3.14, 3.14]5000000.00e+0000.00e+000%0.00e+0001163000x1.00coshf_c     [-3.14, 3.14]5000001.91e-0062.37e-005%2.28e-007283000x4.11coshf_neon  [-3.14, 3.14]5000001.91e-0062.22e-005%1.68e-007108000x10.77tanhf       [-3.14, 3.14]5000000.00e+0000.00e+000%0.00e+0001555000x1.00tanhf_c     [-3.14, 3.14]5000001.21e-0052.48e-001%5.48e-006235000x6.62tanhf_neon  [-3.14, 3.14]5000002.38e-0072.47e-001%5.40e-008119000x13.07expf       [0.00, 10.00]5000000.00e+0000.00e+000%0.00e+000960000x1.00expf_c     [0.00, 10.00]5000009.77e-0036.58e-005%1.64e-003132000x7.27expf_neon  [0.00, 10.00]5000009.77e-0036.58e-005%1.64e-00388000x10.91logf       [1.00, 1000.00]5000000.00e+0000.00e+000%0.00e+0001027000x1.00logf_c     [1.00, 1000.00]5000007.63e-0061.03e-002%1.07e-006116000x8.85logf_neon  [1.00, 1000.00]5000007.63e-0061.03e-002%1.07e-00682000x12.52log10f       [1.00, 1000.00]5000000.00e+0000.00e+000%0.00e+0001202000x1.00log10f_c     [1.00, 1000.00]5000003.34e-0066.68e-003%4.84e-007116000x10.36log10f_neon  [1.00, 1000.00]5000003.34e-0066.68e-003%4.84e-00781000x14.84floorf     [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+0004705000x1.00floorf_c   [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000819000x5.74floorf_neon[1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000671000x7.01ceilf     [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+0005734000x1.00ceilf_c   [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000814000x7.04ceilf_neon[1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000696000x8.24fabsf     [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+0002005000x1.00fabsf_c   [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000455000x4.41fabsf_neon[1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000446000x4.50sqrtf      [1.00, 1000.00]5000000.00e+0000.00e+000%0.00e+0003222000x1.00sqrtf_c    [1.00, 1000.00]5000002.33e-0041.06e-003%8.69e-005139000x23.18sqrtf_neon [1.00, 1000.00]5000007.63e-0062.91e-005%1.60e-00685000x37.91invsqrtf      [1.00, 1000.00]5000000.00e+0000.00e+000%0.00e+000106000x1.00invsqrtf_c    [1.00, 1000.00]5000004.35e-0064.78e-004%2.00e-00794000x1.13invsqrtf_neon [1.00, 1000.00]5000001.19e-0072.12e-005%4.81e-00970000x1.51atan2f       [0.10, 10.00]100000.00e+0000.00e+000%0.00e+0002388000x1.00atan2f_c     [0.10, 10.00]100001.73e-0042.23e-002%0.00e+000657000x3.63atan2f_neon  [0.10, 10.00]100001.67e-0042.12e-002%0.00e+000278000x8.59powf       [1.00, 10.00]100000.00e+0000.00e+000%0.00e+0008316000x1.00powf_c     [1.00, 10.00]100001.36e+0055.88e-003%0.00e+000493000x16.87powf_neon  [1.00, 10.00]100001.36e+0055.88e-003%0.00e+000292000x28.48fmodf       [1.00, 10.00]100000.00e+0000.00e+000%0.00e+0001394000x1.00fmodf_c     [1.00, 10.00]100009.99e+0008.06e-002%0.00e+000341000x4.09fmodf_neon  [1.00, 10.00]100009.97e+0008.06e-002%0.00e+000238000x5.86

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A mathematical Library Based on Neon commands

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A mathematical Library Based on Neon commands

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support