Pain that deblock never imagined

Source: Internet
Author: User
The background of the camera was a white wall that day, and the problem occurred during decoding.

Symptom:
The decoded image has obvious blocks on the wall, and the other image changes greatly (such as the face part.

Problem Analysis:
This is not the case before. Only the encoding rate is changed. In order to take into account the network conditions, we reduced the bit rate relatively low. The bit rate is small, the quantization accuracy is low, and the quantization error between the macro block and the macro block increases. When the color changes very gently, this small error becomes obvious, and the image is reflected by the obvious color jump at the macro block boundary, that is, the square.

Solution:
Fortunately, the MPEG4 standard has taken this phenomenon into consideration. They provide a deblock post-decoding method to weaken this phenomenon. After adding deblock, the image is indeed much smoother and the blocks are gone.

Deblock principle:
The general approach is to average the color of each vertex and its top 10 points and the left 10 points. The specific method is to divide the image into blocks by 8x8, and average the image by block boundary. As shown in.

|
|
Block 1 | Block 2 | V0
V0 V1 V2 V3 V4 | V5 V6 V7 V8 V9 | V1
| V2
| V3
| V4
----------------------------------------------------------------
| V5
| V6
| V7
Block 3 | block 4 | V8
| V9
|
|
-----------------------------------------------------------------

Take 10 points for each calculation. First, judge the differences between the 10 points. If the difference is small, use the default mode to make a small correction to V4 V5, if the difference is large, use the DC offset mode to correct the value of the v1-v8 for a total of 8 points. The discriminant algorithm is:
Eq_cnt = Y (v0-v1) + Y (V1-V2) + Y (v2-v3) + Y (v4-v5) + Y (v5-v6)
+ Y (v6-v7) + Y (v7-v8)
Where if (x <= thr1) y (x) = 0 else y (x) = 1; // thr1 is an experience value of 1.

If (eq_cnt <thr2) // thr1 is an experience value of 6
Default Mode
Else
DC offset mode

DC offset mode rules:

Max = max (V [1], V [2], V [3], V [4], V [5], V [6], V [7], V [8]);
Min = min (V [1], V [2], V [3], V [4], V [5], V [6], V [7], V [8]);
If (max-min) <2 * quant)
Color average
Else
Nothing

The average color is based on weights of, and 1. For example

S [1] = (uint8_t) (6 * P0 + (V [1] <2) + (V [2] <1) + (V [3] <1) + V [4] + V [5] + 8)> 4 );
S [2] = (uint8_t) (P0 <2) + (V [1] <1) + (V [2] <2) + (V [3] <1) + (V [4] <1) + V [5] + V [6] + 8)> 4 );
S [3] = (uint8_t) (P0 <1) + (V [1] <1) + (V [2] <1) + (V [3] <2) + (V [4] <1) + (V [5] <1) + V [6] + V [7] + 8)> 4 );

Calculate s [1]-s [8] and write it back to the position V [1]-V [8.

 

Calculation amount analysis:
Statistics show that most of them use the DC offset mode. If the image size is 352*288, only the calculation amount of the brightness component is analyzed.
There are 352/8-1 = 43 vertical demarcation lines for 8*8 blocks, and 288 Points for each demarcation line, that is, 43*288 = 12384 average operations for vertical deblock.
There are 288/8-1 = 35 horizontal demarcation lines, and each demarcation line has 352 points, that is, the vertical deblock must perform 35*352 = 12320 average operations.
Perform each average operation:
(1) read 10 bytes from the memory. In vertical mode, 10 bytes are continuous. in horizontal mode, each byte is separated by a row of width.
(2) Calculate eq_cnt
(3) Find the maximum and minimum value from V [1]-V [8 ].
(4) Calculate s [1]-s [8]
(5) Save s [1]-s [8] to V [1]-V [8].

In addition, these operations will modify the image value. If the image is an I or P frame, it will also be used as a reference frame for subsequent frames, and the image data cannot be changed, therefore, you need to copy it to the temporary zone for processing, which also consumes a lot of time.

Every frame of deblock will read and write data about two times, which is more time-consuming than YUV-> RGB. The actual situation is true.
It was frustrating that it was originally optimized to 32fbs, And it was directly reduced to 16fbs after deblock was added.

Optimization analysis:
(1) Reading and Writing memory will take up most of the time, but this part is not well optimized.
(2) Calculate eq_cnt, which requires a small amount of computing and does not need to be concerned.
(3) Find the maximum and minimum value from V [1]-V [8 ].

Max = max (s [1], max (s [2], max (s [3], max (s [4], max (s [5], max (s [6], max (s [7], s [8]);
Min = min (s [1], min (s [2], min (s [3], min (s [4], min (s [5], min (s [6], min (s [7], s [8]);

Xvid provides two methods to calculate Max and min

# Define min (x, y) (x) <(y )? (X) :( y ))
# Define max (x, y) (x)> (y )? (X) :( y ))
# Define fast_max (x, y) (x)-(y)> (32-1) & (X) -(y ))))
# Define fast_min (x, y) (x) + (y)-(x)> (32-1) & (y) -(X ))))
The second method consumes 4 commands.
If assembly is used, only three commands are required.

MoV R1, R2 // R1 = max (R2, R3)
CMP R3, r1
Movgt R1, r3

(4) Calculate s [1]-s [8]. There are some repeated computations in the formula. You can calculate the duplicate part first and then replace it as a whole.

These optimizations are all time-consuming parts, and the real time-consuming part is the memory access part. You can reduce the number of calculated points based on the image quality. In the standard, every average calculation is written at 10 points and 8 points can be read at 8 points and 6 points can be read.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.