一個基於NEON指令的數學庫

來源:互聯網
上載者:User

這是一個開源的庫,地址為https://code.google.com/p/math-neon/,根據項目介紹應該是利用neon指令實現的數學庫:包括三角、對數、指數等基於浮點的運算實現,以及矩陣運算,因為是基於neon指令它必須在arm cortex-a架構(有neon指令支援)上才能運行。從項目介紹說因為gcc對於neon的支援不是很好(估計是指neon內在函數效率不如彙編),所以核心的運算代碼都是使用內聯彙編寫成的。如果想編譯並測試,可以下載作者寫的Makefile(地址為http://gitorious.org/vjaquez-misc/math-neon/commit/14ba470caad37c33cf7245be69efc9a1366d8f99?format=patch)。

本人是想在WINCE下使用(平台為cortex-a8架構),因為代碼使用了大量的內聯彙編,如果想移植到WINCE平台需要重寫彙編檔案或利用WEC7編譯器的內在函數功能(參見http://blog.csdn.net/alien75/article/details/8740641),兩者均有很大的工作量,此時想到了久未使用的mingw32ce這個toolchain工具(參見http://blog.csdn.net/alien75/article/details/6998223),因為僅僅是編譯出PE架構的靜態庫,此工具完全能滿足需要,只是要修改一下Makefile才能進行正常編譯。

原Makefile內容如下

 

CFLAGS := -O2 -ggdb -mcpu=cortex-a9 -mfloat-abi=softfp -mfpu=neon -ansi -std=gnu99 -pedanticWARNINGS := -Wall -Wextra -Wno-unused-parameter -Wmissing-prototypesASSEMBLER := -Wa,-mimplicit-it=thumboverride CFLAGS += $(WARNINGS) $(ASSEMBLER)LIBS := -lmall: math_debuglibmathneon.a: math_acosf.o math_ldexpf.o math_powf.o math_sqrtfv.o \math_asinf.o math_expf.o math_log10f.o math_runfast.o math_tanf.o \math_atan2f.o  math_fabsf.o math_logf.o math_sincosf.o math_tanhf.o \math_atanf.o math_floorf.o math_mat2.o math_sinf.o math_vec2.o \math_ceilf.o math_fmodf.o math_mat3.o math_sinfv.o math_vec3.o \math_cosf.o math_frexpf.o math_mat4.o math_sinhf.o math_vec4.o \math_coshf.o math_invsqrtf.o math_modf.o math_sqrtf.omath_debug: math_debug.o libmathneon.a$(CC) $(LDFLAGS) -o $@ $^ $(LIBS)%.o:: %.c$(CC) $(CFLAGS) -o $@ -c $<%.a::$(AR) rcs $@ $^clean:$(RM) -v math_debug *.o *.a

修改後的內容

CC=arm-mingw32ce-gccAR=arm-mingw32ce-ar rcCFLAGS := -O2 -ggdb -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon -ansi -std=gnu99 -pedantic -DNO_ERRNO_H -D_WIN32_WCELDFLAGS := -L.WARNINGS := -Wall -Wextra -Wno-unused-parameter -Wmissing-prototypesASSEMBLER := -Wa,-mimplicit-it=thumboverride CFLAGS += $(WARNINGS) $(ASSEMBLER)#LIBS := -lmall: math_debuglibmathneon.a: math_acosf.o math_ldexpf.o math_powf.o math_sqrtfv.o \math_asinf.o math_expf.o math_log10f.o math_runfast.o math_tanf.o \math_atan2f.o  math_fabsf.o math_logf.o math_sincosf.o math_tanhf.o \math_atanf.o math_floorf.o math_mat2.o math_sinf.o math_vec2.o \math_ceilf.o math_fmodf.o math_mat3.o math_sinfv.o math_vec3.o \math_cosf.o math_frexpf.o math_mat4.o math_sinhf.o math_vec4.o \math_coshf.o math_invsqrtf.o math_modf.o math_sqrtf.omath_debug: math_debug.o libmathneon.a$(CC) $(LDFLAGS) -o $@ $^ $(LIBS)%.o:: %.c$(CC) $(CFLAGS) -o $@ -c $<%.a::$(AR) $@ $^clean:$(RM) -v math_debug *.o *.a

測試結果(系統函數、c語言最佳化函數和neon彙編函數比較結果見Rate後數字)

RUNFAST: Enabled ------------------------------------------------------------------------------------------------------MATRIX FUNCTION TESTS ------------------------------------------------------------------------------------------------------matmul2_c = |2.66, -2.73||-5.74, -15.83|matmul2_neon = |2.66, -2.73||-5.74, -15.83|matmul2: c=112000  neon=65000  rate=1.72 matvec2_c = |2.66, -5.74|matvec2_neon = |2.66, -5.74|matvec2: c=66000  neon=53000  rate=1.25 matmul3_c =|-17.73, -8.39, -1.10||8.30, -5.32, 23.03||-5.67, -7.81, 9.07|matmul3_neon =|-17.73, -8.39, -1.10||8.30, -5.32, 23.03||-5.67, -7.81, 9.07|matmul3: c=394000  neon=120000  rate=3.28 matvec3_c = |-17.73, 8.30, -5.67|matvec3_neon = |-17.73, 8.30, -5.67|matvec3: c=66000  neon=53000  rate=1.25 matmul4_c =|-8.86, 8.70, -17.78, -7.64||-13.15, 20.92, -10.97, -14.02||17.37, -14.46, -13.16, 33.82||15.42, -27.32, -5.66, -6.37|matmul4_neon =|-8.86, 8.70, -17.78, -7.64||-13.15, 20.92, -10.97, -14.02||17.37, -14.46, -13.16, 33.82||15.42, -27.32, -5.66, -6.37|matmul4: c=991000  neon=141000  rate=7.03 matvec4_c = |-8.86, -13.15, 17.37, 15.418112|matvec4_neon = |-8.86, -13.15, 17.37, 15.418112|matvec4: c=66000  neon=53000  rate=1.25 dot2_c = 3.756326dot2_neon = 3.756326dot2: c=532000  neon=497000  rate=1.07 normalize2_c = [-0.74, -0.68]normalize2_neon = [-0.74, -0.68]normalize2: c=691000  neon=313000  rate=2.21 dot3_c = 3.698457dot3_neon = 3.698457dot3: c=572000  neon=514000  rate=1.11 normalize3_c = [-0.74, -0.68, -0.01]normalize3_neon = [-0.74, -0.68, -0.01]normalize3: c=806000  neon=353000  rate=2.28 cross3_c = [-4.69, 5.12, -1.46]cross3_neon = [-4.69, 5.12, -1.46]cross3: c=586000  neon=373000  rate=1.57 dot4_c = -4.564567dot4_neon = -4.564566dot4: c=625000  neon=487000  rate=1.28 normalize4_c = [-0.24, -0.22, -0.00, 0.95]normalize4_neon = [-0.24, -0.22, -0.00, 0.95]normalize4: c=924000  neon=343000  rate=2.69 ------------------------------------------------------------------------------------------------------CMATH FUNCTION TESTS ------------------------------------------------------------------------------------------------------FunctionRangeNumberABS Max ErrorREL Max ErrorRMS ErrorTimeRate------------------------------------------------------------------------------------------------------sinf       [-3.14, 3.14]5000000.00e+0000.00e+000%0.00e+000880000x1.00sinf_c     [-3.14, 3.14]5000008.34e-0071.00e+002%4.09e-007162000x5.43sinf_neon  [-3.14, 3.14]5000008.34e-0071.00e+002%4.09e-00796000x9.17cosf       [-3.14, 3.14]5000000.00e+0000.00e+000%0.00e+000906000x1.00cosf_c     [-3.14, 3.14]5000008.34e-0076.74e-001%4.16e-007192000x4.72cosf_neon  [-3.14, 3.14]5000001.41e+0006.64e+007%1.00e+000142000x6.38tanf       [-0.79, 0.79]5000000.00e+0000.00e+000%0.00e+0001140000x1.00tanf_c     [-0.79, 0.79]5000002.98e-0067.97e-004%1.31e-006200000x5.70tanf_neon  [-0.79, 0.79]5000001.91e-0063.62e-004%6.66e-007126000x9.05asinf      [-1.00, 1.00]5000000.00e+0000.00e+000%0.00e+0002732000x1.00asinf_c    [-1.00, 1.00]5000005.53e-0051.06e-002%1.69e-005277000x9.86asinf_neon [-1.00, 1.00]5000004.65e-0058.87e-003%1.#Re+000151000x18.09acosf      [-1.00, 1.00]5000000.00e+0000.00e+000%0.00e+0002670000x1.00acosf_c    [-1.00, 1.00]5000005.56e-0056.46e-003%1.69e-005312000x8.56acosf_neon [-1.00, 1.00]5000004.67e-0056.35e-003%1.#Re+000171000x15.61atanf      [-1.00, 1.00]5000000.00e+0000.00e+000%0.00e+0001021000x1.00atanf_c    [-1.00, 1.00]5000001.67e-0042.12e-002%7.40e-005198000x5.16atanf_neon [-1.00, 1.00]5000001.67e-0042.12e-002%7.40e-005121000x8.44sinhf       [-3.14, 3.14]5000000.00e+0000.00e+000%0.00e+0001509000x1.00sinhf_c     [-3.14, 3.14]5000001.91e-0061.52e-001%2.37e-007280000x5.39sinhf_neon  [-3.14, 3.14]5000001.91e-0061.52e-001%1.90e-007108000x13.97coshf       [-3.14, 3.14]5000000.00e+0000.00e+000%0.00e+0001163000x1.00coshf_c     [-3.14, 3.14]5000001.91e-0062.37e-005%2.28e-007283000x4.11coshf_neon  [-3.14, 3.14]5000001.91e-0062.22e-005%1.68e-007108000x10.77tanhf       [-3.14, 3.14]5000000.00e+0000.00e+000%0.00e+0001555000x1.00tanhf_c     [-3.14, 3.14]5000001.21e-0052.48e-001%5.48e-006235000x6.62tanhf_neon  [-3.14, 3.14]5000002.38e-0072.47e-001%5.40e-008119000x13.07expf       [0.00, 10.00]5000000.00e+0000.00e+000%0.00e+000960000x1.00expf_c     [0.00, 10.00]5000009.77e-0036.58e-005%1.64e-003132000x7.27expf_neon  [0.00, 10.00]5000009.77e-0036.58e-005%1.64e-00388000x10.91logf       [1.00, 1000.00]5000000.00e+0000.00e+000%0.00e+0001027000x1.00logf_c     [1.00, 1000.00]5000007.63e-0061.03e-002%1.07e-006116000x8.85logf_neon  [1.00, 1000.00]5000007.63e-0061.03e-002%1.07e-00682000x12.52log10f       [1.00, 1000.00]5000000.00e+0000.00e+000%0.00e+0001202000x1.00log10f_c     [1.00, 1000.00]5000003.34e-0066.68e-003%4.84e-007116000x10.36log10f_neon  [1.00, 1000.00]5000003.34e-0066.68e-003%4.84e-00781000x14.84floorf     [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+0004705000x1.00floorf_c   [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000819000x5.74floorf_neon[1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000671000x7.01ceilf     [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+0005734000x1.00ceilf_c   [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000814000x7.04ceilf_neon[1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000696000x8.24fabsf     [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+0002005000x1.00fabsf_c   [1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000455000x4.41fabsf_neon[1.00, 1000.00]50000000.00e+0000.00e+000%0.00e+000446000x4.50sqrtf      [1.00, 1000.00]5000000.00e+0000.00e+000%0.00e+0003222000x1.00sqrtf_c    [1.00, 1000.00]5000002.33e-0041.06e-003%8.69e-005139000x23.18sqrtf_neon [1.00, 1000.00]5000007.63e-0062.91e-005%1.60e-00685000x37.91invsqrtf      [1.00, 1000.00]5000000.00e+0000.00e+000%0.00e+000106000x1.00invsqrtf_c    [1.00, 1000.00]5000004.35e-0064.78e-004%2.00e-00794000x1.13invsqrtf_neon [1.00, 1000.00]5000001.19e-0072.12e-005%4.81e-00970000x1.51atan2f       [0.10, 10.00]100000.00e+0000.00e+000%0.00e+0002388000x1.00atan2f_c     [0.10, 10.00]100001.73e-0042.23e-002%0.00e+000657000x3.63atan2f_neon  [0.10, 10.00]100001.67e-0042.12e-002%0.00e+000278000x8.59powf       [1.00, 10.00]100000.00e+0000.00e+000%0.00e+0008316000x1.00powf_c     [1.00, 10.00]100001.36e+0055.88e-003%0.00e+000493000x16.87powf_neon  [1.00, 10.00]100001.36e+0055.88e-003%0.00e+000292000x28.48fmodf       [1.00, 10.00]100000.00e+0000.00e+000%0.00e+0001394000x1.00fmodf_c     [1.00, 10.00]100009.99e+0008.06e-002%0.00e+000341000x4.09fmodf_neon  [1.00, 10.00]100009.97e+0008.06e-002%0.00e+000238000x5.86

 

 

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.