Highlights of XviD Technology

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Quarter Pel = quarter pixel = 1/4 pixel = qpel:
During Mpeg Compression, p-frame compresses the image based on the previous image, and B-frame compresses the Image Based on the front or back.
At this time, we only need to record the differences between the reference image, that is, the error of prediction, and the moving direction of the object (Motion Vector mV ),
You do not need to re-compress the entire image, so you can save a lot of bits to achieve a high compression rate.
B-frame has the highest compression efficiency, because B-frame can refer to the front and back pictures at the same time,
The average value of the previous image (before + after)/2) is used as a reference image, which can greatly reduce the prediction error.
(The smaller the prediction error, the smaller the bits of the record error required, the smaller the file, and the higher the compression efficiency)
At the same time, the MPEG-4 of the B-VOP has a fourth prediction mode, called direct mode, directly take the following P-frame mv
Divide by two as the action vector, which saves the space for recording music videos and achieves high compression efficiency.
For example:
I B p
We can predict that the movement of B's object must be between I and P, so the MV of B will be close to half of the MV of P.
The above is the general compression principle. Let's look back at what qpel is.
As mentioned above, P/B frame will look for the nearest block on the reference screen, and record the error value with this block,
And the distance and location (MV ).
The unit of Mpeg Compression is 16x16 blocks, which are called macroblock (MB ),
Search for the most matched and error reference boxes in one MB and one MB.
(That is, where to search for this object)
The search will be searched within a certain range, for example, in the surrounding range of 32x32, and will not be expanded without limit.
So when the screen is very dynamic, the object moves far away, beyond the search range, or the screen changes too much,
We can't find a reference box with a small error. At this time, the compression ratio will decrease and a large amount of BITs is required for record.
Obviously, the movement of an object is irrelevant to the precision of the pixel, and the object does not move in one grid according to the pixel lattice,
Every time the integer's lattice point is moved, it falls on the pixel.
Therefore, we search and compare the data in integer pixels. Obviously, the reference boxes with the most matching, similarity, and minimum error cannot be found.
In order to overcome this problem, MPEG-2 compression, will first be referred to the picture to do the inner interpolation value (interpolation ),
The second pixel value between the pixel and the pixel, for example:
A x B
X
C x D
If the value of pixel A is 11 and the value of pixel B is 13, we can predict that the value of sub-pixel x between pixel A and pixel B is 12.
So far, we can add all the values of X, that is, the value of 1/2 pixel, and use the image of 1/2 pixel accuracy as the result.
You can search for reference boxes on the reference screen.
In this way, we can find reference boxes with smaller errors, which have higher compression ratio, smaller archives with the same quality, and higher quality under the same capacity.
According to the test, I (motion estimation) with a precision of 1/2 pixel, peak signal to noise ratio, signal-to-noise ratio,
A common method for objectively testing image quality can increase by 3 ~ 5 dB.
Quarter pixel = 1/4 Pel for MPEG-4, which is more accurate than 1/2 Pel, and then between 1/2 pixels and
1/4 Pel. Theoretically, 1/4 Pel can increase the SNR by 2 ~ 3 dB.
Aoxob
Ooooo
Xoxox
Ooooo
Coxod
O: 1/4 Pel
However, if 1/4 Pel is not well implemented, 1/4 Pel cannot be used to find a better reference box,
Therefore, because the motion vector MV used by 1/4 Pel requires the double precision of the original 1/2 Pel to be recorded (for example, 1.5 --> 1.25 ),
Therefore, the files will become larger. (Quality is worse under the same capacity)
In the early days, the qpel of XviD was not implemented well and there were some errors. Therefore, it was not helpful for compression efficiency,
After use, the file is larger.
But now the Xvid qpel has been corrected, fully compliant with the standard specification of the MPEG-4, but also to the theoretical help of Its compression.
You can perform a test to compress the same quality (fixed quantizer). After qpel is enabled, the file size will be reduced ~ 3%.
This indicates that qpel is better than qpel.
With the above basic descriptions of Mpeg Compression, I will add several terms mentioned last time:
Chroma me
Me = motion estimation action Estimation
Search for the nearest square on the reference screen and find the distance and direction between them. The process of Motion Vector mv = motion vector is called me.
MC = Motion Compensation Action Compensation
Subtract the blocks to be compressed from the referenced blocks, and record the error values between them so that they can be decompressed
Add this error value. This process is called MC.
During Mpeg Compression, pixels are divided into three planes: YUV. Generally, only me is performed on the plane of Y (brightness) to search for music videos with the smallest y error.
The UV (color, chroma) Action vector is divided by two by the Action vector found by Y as the UV action vector.
(When MPEG is recorded at YUV, the UV resolution is only half of Y, that is, the image size is only half of Y.
For example, Y: 640x480, UV is only 320x240. Therefore, you can obtain an approximate value by dividing y by two)
This is because the human eye is sensitive to the brightness of Y and is not sensitive to the color of C. The color resolution is almost the same as that of the human eye.
Reduce the space occupied by C and use a little more space for y to improve the visual quality with a limited traffic volume.
However, when doing me, I am lazy. I only want to accurately find y's mV, while C's MV is directly used by Y's mV/2,
In this way, the compression speed can be accelerated, but the quality will also decrease. (C. There is no exact reference box for finding the smallest error,
A large amount of BITs records are required to reduce the compression ratio)
Xvid is now added with the chroma me option. It searches for the MV with the smallest error on the Y/C plane at the same time,
The speed is slower, but the quality is better.
Especially for animated films, the effect is the most obvious.
Why do we need to reduce the number of consecutive B-frames when the dynamics increase? B-frame is not the highest compression rate. The more the better?
Inside the MPEG-1, there are three types of frame:
I-frame: no reference to other images, independent compression, the worst compression rate, maximum bits required, maximum frame size.
P-frame: see the compression ratio of the previous I or P frame.
B-frame: see I or P frame compression before and after compression, the maximum compression rate. B-frame cannot be used as a reference image by other frames.
B-frame (the correct name in the MPEG-4 is the B-VOP) has four prediction modes:
A. Forward goes forward to the prediction. refer to the previous picture to record the gap with the previous picture.
The prediction method is the same as that of P-frame.
B. Backward reverse prediction. Refer to the next picture to record the gap with the next picture.
C. Bi-directionally two-way prediction. refer to the previous and later pictures to record
And "average of the two pictures. It is also called interpolation prediction, with the highest compression ratio.
D. Direct mode, which does not search for or record motion vectors. It is directly composed of the next P Frame
Export the action vector. For example, I B p, we can predict the action of B.
Is between I and P, so we can directly use P's mV/2
B's action vector, which saves the space for recording music videos.
When compressing B-frame, the smallest mode is selected from the above prediction modes.
When the maximum number of B frames is set too much
Failing to make full use of its functions will lead to misjudgment. in highly dynamic scenarios, too many B-frames are inserted.
For example:
Large Dynamic screen
I B P
P is too far away from the reference I frame, the error is too large, and the Capacity surges. And the first B-frame, because of the gap between I and P
Very large. The mean value of (I + p)/2 is very different from that of B frame. It is better to use I frame only.
To predict. At this time, B only references the previous I compression, which is equal to P-frame.
In the middle of B, refer to (I + p)/2 compression. Because of the large difference, it is still impossible to achieve a good compression ratio.
The last B references only the P compression.
In the end, none of the four B-frames can achieve a good compression ratio, and the size is almost the same as that of P,
In this case
0 1 2 3 4 5
I p
Instead, we can get a better compression rate.
(Because 1 p references 0i, the gap is small. 2 P can refer to 1 p, and the gap is small. And so on ....)
Now, the dynamic allocation of XviD is much better than before, and the maximum number of B-frames can be safely set to 4.
For B-frame of DivX 5, the maximum number of consecutive frames can only be 1. It can only be I B p B,
Not to mention the advanced I/P/B frame allocation decisions currently used by XviD. DivX 5 is no better than XviD.
Finally, there are many questions about Xvid retention.
During Mpeg Compression, the macroblock size of each 16x16 pixel block is used as me.
It will be split into four 8x8 blocks (microblocks) for a conversion called DCT.
After DCT conversion, the YUV value of 64 pixels in 8x8 blocks becomes the coefficient representing the spatial frequency.
Human eyes are not sensitive to high frequencies, and the relative low frequencies are important. Therefore, we use quantization to remove the high frequencies by a little more,
Keep an important low-frequency coefficient, which can improve the quality of the eyes with limited traffic.
Xvid can use two different quantization methods (quantization type), H.263 and MPEG.
The Quantization Method of H.263, as the name suggests, is the quantization method used to use the compression specification of H.263,
All DCT coefficients in the 8x8 pixel blocks are divided by the same number. (This action is called quantification)
For example, if all values are divided by 32, if there is a DCT coefficient of 15, less than 32, after division, it will be quantified as 0,
This saves a lot of record bits.
Of course, the larger the number, the larger the quantization error, and the poorer the quality, but the higher the compression ratio and the smaller the files to be pressed out.
We will use another parameter to adjust the quantization error and control the final quantization quality and file size. This parameter is called quantizer.
The Quantization coefficient is multiplied by the multiple of the quantizer. For example, the original quantization coefficient is 32, and the quantizer is 2,
The corresponding magnification is also 2, and the quantization factor to be division is changed to 32*2 = 64.
Therefore, the larger the quantizer, the larger the quantization coefficient to be deleted, the larger the quantization error, the poorer the quality, but the smaller the archive.
The Quantization Method of H.263 also stipulates that the quantizer of two adjacent macroblocks cannot differ by more than 2.
Another method of MPEG quantization is that the high and low frequency coefficients can be divided by different quantization coefficients, which can be used to increase the frequency by a bit as needed.
The 8x8 quantization coefficient, that is, the quantize matrix ).
Xvid also allows you to customize and edit the quantization coefficient of this matrix. You can customize the most appropriate quantization matrix according to the video content and bit rate.
(Select MPEG-custom for the quantification method, and change the preset quantization matrix in edit quantizer matrix,
Currently, this function cannot be shared with B-frame)
The MPEG quantization method has no limit on the quantizer gap used by adjacent MB.
Based on experience, the image is blurred by the H.263 Quantization Method of the uniform quantization matrix (uniform quantization.
The image of the mpeg quantization method is sharp. (However, some noise may occur on the surroundings of sharp lines and the edges of objects)
MS MPEG-4, that is, DivX 3.11, the use of MPEG quantification method, so it has been, everyone's evaluation is MS MPEG-4
The image is sharp and more details are retained.
DivX 4 and DivX 5 use the H.263 quantization method, especially DivX 4, and the image is very blurred.
Although the surface looks less defective, the details are all stripped.
(Refer to the picture provided by brother net1999 above)
(DivX 5 can actually be compressed by modifying the Registry Method to the mpeg quantization method, but there is obviously a bug,
)
Xvid allows users to decide which quantification method to use or change the quantification method as needed.
(Modulated is used for quantification. If quantizer is less than or equal to 3, MPEG is used for quantification,
If the value is greater than 3, H.263 is used for quantization. New modulated HQ is reversed)
GMC, that is, S (GMC)-VOP, only when most of the blocks on the entire screen are moved in the same direction,
In order to be used. For example, when the lens is Pan (translated from left to right or from right to left), the whole picture goes from top to bottom,
S (GMC)-VOP is used only when you move from bottom to top and zoom in/zoom out (the object is zoomed in and out.
(In fact, there are other functions, such as deformation and rotation, but currently neither DivX nor Xvid are fully implemented)
When using GMC, the frame will use the frame type that is only available in the MPEG-4, called S-VOP.
(Because the MPEG-4 is compressed in object units, it is called video object plane, VOP,
Video Object Plane. There are I-VOP/P-VOP/B-VOP and special S-VOP of these VOP)
To distinguish it from stripe, we call it s (GMC)-VOP.
Therefore, to compare the usage of GMC, you must find two identical images, which are S (GMC)-VOP.
(That is, VOP using GMC) can be used to see the results of GMC usage.
Currently, GMC of XviD only has simple functions. Currently, using global MC is not more efficient than the original local MC,
It is not helpful for compression, but the archive will be larger after it is used (the compression efficiency is worse, and the quality is worse under the same capacity ).
And there are some correctness of the problem to be corrected (to comply with the ISO MPEG-4 Standard Specification, otherwise it is wrong,
The decompressed things will not be decoded for other standard MPEG-4 decoder), so it is not recommended.
Xvid programmers are the top programming experts in the world,
Currently, they do not know the problems in codec, but are still thinking about the solution.
For example, currently B-frame must check dx50 B-VOP compatibility, this closed gov
(Equal to the MPEG-1/2 closed GOP) The problem developers don't know, but it is very difficult to solve.
(We will think it is very simple. That's right, it's very simple in theory, so we know how difficult it is to implement it)
In addition, developers also have their own academic and work needs to be busy, and they can only use a little idle time.
To engage in this programming work, so it is impossible to change it, and immediately solve these problems.
However, Xvid is still one of the world's best quality MPEG-4 coding software...
(I can't say it too full. I need to leave a little room for improvement)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Highlights of XviD Technology

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Highlights of XviD Technology

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support