Neon command for yuv420 to rgb24 Conversion Efficiency

Last Update:2018-12-05 Source: Internet

Author: User

Tags mul

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

From the Internet found a neon command optimization yuv420 to rgb24 code, in the cortex-A8 architecture, clock speed 1g CPU for a frame of qcif (176x144) data test, in addition, compared with the popular algorithm written in C on the Internet, it is found that the speed of the former is more than 700 times that of the latter: the former uses 1000 ms for 112 cycles, and the latter uses 88645 Ms. The related code is as follows:

Assembly Code

Area |. text |, code, readonly; name this block of code export done; void imgyuv2rgb24_neon (u8 * pu8rgbbuffer, u8 * pu8srcyuv, L32 l32width, L32 l32height) pushed {R4, R5, r6, R7, R8, R9, R10, LR} running FDSP !, {R4-r10, LR} Add R4, R2, R2 add R4, R4, R2; R4: dststride = 3 * l32width Mul R5, R4, R3 sub R5, R5, R4 add r0, r0, R5; R0: pu8dst = pu8dst + l32dststride * (l32height-1) Mul R5, R2, R3 add R6, R1, R5; R6: pu8srcu = pu8srcyuv + l32width * l32height add R7, R6, R5, LSR #2; R7: pu8srcv = pu8srcu + (l32width * l32height)> 2); LSR R8, R2, #3; R8 records the number of Col cycles, R2 records the YUV Image Width mov R8, R2, LSR #3; LSR lR, R3, #1; LR records the number of cycles in the row. R3 records the YUV Image Height mov LR, R3, LSR #1 add R3, R1, R2; R1, pu8src1; r3: pu8src2, R2: l32width sub R5, R0, R4; R5: pu8dst2 = pu8dst-l32dststride mov R9, # 16vdup. 8 D8, r9mov R10, # 128vdup. 8 D9, r10mov R9, # 75vdup. 16 Q5, R9; Q5: 75mov R10, # 102vdup. 16 Q6, R10; Q6: 102mov R9, # 25vdup. 16 Q7, R9; Q7: 25mov R10, # 52vdup. 16 Q8, R10; Q8: 52mov R9, # 129vdup. 16 Q9, R9; Q9: 129 loop_rowlo Op_colsubs R8, R8, #1vld1. u8 D0, [R1]!; Yline1vld1. u8 D2, [R3]!; Yline2vld1.32 {D4 [0]}, [R6]!; Uvld1.32 {D4 [1]}, [R7]!; Vvsubl. u8 q0, D0, d8; yline2-16vsubl. u8 Q1, D2, d8; yline1-16vsubl. u8 Q2, D4, d9vmov Q3, q2vzip. s16q2, Q3; Q2: U-128 Q3: V-128; start to calculate the multiplication part vmul. s16 Q10, Q3, q8vmla. s16 Q10, q2, Q7; obtain the sum of U and V in the second half of the G component. s16 q11, q2, Q9; obtain the uvmul required for calculating the second half of B. s16 q12, Q3, Q6; obtain the V required for calculating the second half of the r component, and calculate the product vmul of Y. s16 q0, q0, Q5; q0 and Q1 get the 8-Point product vmul of the first line y. s16 Q1, Q1, Q5; Q2 and Q3 get the 8-Point product of the second row y; get the G component vqsub of the two rows. s16 q13, q0, q10vqsub. S16 q14, Q1, q10vqrshrun. s16 d27, q13, #6; G vqrshrun in the first line. s16 d30, q14, #6 ;;;;;;;;;;;;;;;;;;;;;;g of the second row gets the B component vqadd of the two rows. s16 Q10, q0, q11vqadd. s16 q11, Q1, q11vqrshrun. s16 d26, Q10, #6; bvqrshrun in the first line. s16 D29, q11, #6; ;;;;;;;;;;;;;;;;;;;; B of the second row; obtain the r component vqadd of the two rows. s16 q11, q0, q12vqadd. s16 q12, Q1, q12vqrshrun. s16 d28, q11, #6 ;;;;;;;;;;;;;;;;;;;;; rvqrshrun in the first line. s16 d31, q12, #6 ;;;;;;;;;; ;;;;;;;; The r ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; perform the interleave operation to form the RGB format, and then store it to the target buffervst3.8 {d26, d27, d28}, [R0]! Vst3.8 {D29, d30, d31}, [R5]! Bgt loop_colsubs LR, LR, #1 sub r0, R5, R4, LSL # 1sub R5, R0, r4add R1, R1, r2add R3, R3, R2; LSR R8, R2, #3 movr8, R2, LSR #3 Bgt loop_row; pop {R4, R5, R6, R7, R8, R9, R10, LR} ldmfdsp !, {R4-r10, LR} bx lr end

C code

Void yuv420p_to_rgb24 (unsigned char * yuv420 [3], unsigned char * rgb24, int width, int height) {// int begin = gettickcount (); int R, G, B, y, U, V; int X, Y; int nwidth = width> 1; // color signal width for (y = 0; y

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More