- CPU MLAA
- GPU MLAA
- Jimenez's MLAA
Original article: http://www.cnblogs.com/gongminmin/archive/2011/05/16/2047506.html
Anti-alias (AA) is widely used in graphics to improve rendering quality. After decades of development, AA has gradually popularized the field of real-time rendering from offline rendering. This series of articles will summarize the previous world and current life of the AA method used in real-time rendering. This article focuses on the AA method provided by hardware.
Figure 1. sample points within a pixel. 16 red circles represent 16 sampling points. blue and yellow represent the two triangles that cover this pixel.
Super sampling anti-aliasing is the most intuitive aA method. One of the implementation methods is to render a large image and then downsample it, which is equivalent to a uniform distribution sampling within each final pixel. A more general description is that each pixel distributes multiple sampling points (which can be uniformly distributed, Poisson distribution, random distribution, jitter distribution, etc.), and each sampling point has an independent color and depth, run pixel shader once at each sampling point. In the case of 1, one white, one light blue and 14 yellow sampling points will be obtained. The final value of this pixel is the average of the 16 sampling points, that is (1, 1, 1) + (0.77, 0.77, 1) + 14 * (1, 1, 0)/16 = (0.98, 0.98, 0.125 ). Among these methods, ssaa has the best quality. After all, it is the most violent method. On d3d 10.1 +, you can select per-sample or per-pixel to execute pixel shader, that is, ssaa is directly supported.
Performance Statistics (N samples, the same below): the number of PS executions in each pixel is N, And the occupied space is n colors + N depth.
Ssaa needs to execute PS once at each sampling point and save color and depth. both time and space overhead are amazing. The emergence of multi-sampling anti-aliasing has greatly improved this. Msaa only executes PS once per pixel, and the output color is written to all sampling books tested by depth-stencer. Before Shader Model 3, PS input must be taken from the center of the pixel. After centric interpolation is added, the INPUT attribute of PS can be the center of all sampling points covered by the triangle:
1, without centric interpolation, the two sampling points of the blue triangle get the pure blue color (the color of the third pixel), and the final pixel color is (2 * (0, 0, 1) + 14 * (1, 1, 0)/16 = (0.875, 0.875, 0.125 ). Centric interpolation is a more accurate light blue (1, 1, 1) + (0.77, 0.77, 1)/2 = (0.885, 0.885, 1 ), the final pixel color is (2 * (0.885, 0.885, 1) + 14 * (1, 1, 0)/16 = (0.98, 0.98, 0.125 ). As shown in:
Performance Statistics: The number of PS executions in each pixel is once the triangle that overwrites the pixel, occupying N color + N depth space.
Although MSAA solves the computing problem, the storage volume is still large, especially when the sampling rate is above 8. NVIDIA added the Coverage Sampling Anti-Aliasing method to G80 and later GPUs. CSAA decouples the buffer of color/depth and the buffer of coverage. It can use a small amount of color/depth space to store the quality that can be achieved only by the original high sampling number. If the ratio is 1, it is rendered with CSAA 16x. Then, a pixel is divided into four areas: top left, top right, bottom left, and bottom right. Each area has four coverage sampling points, but share the same color and depth. The result of Figure 1 is (0.25 * (0.885, 0.885, 1) + 0.25 * (0.885, 0.885, 1) + 3.75 * (1, 1, 0 )) /4 = (0.98, 0.98, 0.125 ).
Performance Statistics: The number of PS executions in each pixel is once the triangle that overwrites the pixel, occupying space M color + M depth, and M is less than N.
These three methods are common AA methods directly supported by hardware. The next article will describe various post process-based AA methods.
In the previous article Anti-alias's past and present (I), we introduced the AA method supported by hardware. This article will focus on the emerging post process-based AA.
Although the hardware supports SSAA, MSAA, and CSAA methods, the additional overhead cannot be underestimated. On the one hand, they have an amazing impact on the storage space. Especially on non-desktop platforms, there is not much memory. If AA is needed, the memory will be exhausted. If both MRT and AA are used, the memory overhead is even more astronomical. On the other hand, the consideration of "edge" is the boundary of primitive. Whether or not the edge really needs AA, it will waste a lot of computing.
Chapter 9 of GPU Gems 2: Deferred Shading in S. t. a. l. k. e. r. the concept of Deferred Shading was promoted for the first time in the game industry, and the problem that the Deferred framework cannot use hardware MSAA was also mentioned. Although Deferred Lighting partially solves this problem, the cost of re-rendering the scene on one side is not small. More importantly, due to the introduction of the Deferred framework, people finally began to face up to the fact that MSAA has produced a lot of time and space waste. As a result, the post process-based AA has been booming over the past few years and has been replaced by a great momentum.
Edge AA is Deferred Shading in S.T. a. l. k. e. the method proposed by R performs an edge detection based on the degree of difference between the neighboring depth and normal. Each pixel can get a weight, indicating the degree of the Image Edge ":
Based on this weight, we can use a reverse buffer to calculate the color of neighboring pixels and obtain the effect of AA. For details about the shader, refer to click to open the link.
In GPU Gems Chapter 3 Deferred Shading in Tabula Rasa, NCsoft makes some minor improvements to Edge AA. Edge Detection is no longer dependent on image resolution and is more stable.
Direally ally Edge AA
Edge AA created the post process AA era, but its quality is not enough to compete with hardware AA. AMD's paper on hpg09, A direally ally adaptive edge anti-aliasing filter, changes to edge AA and does not use independent edge points to determine the AA hybrid method, the isoline is determined based on the condition around the edge point, and then the hybrid direction is determined based on the vertical direction of isoline. In this way, a boundary will be mixed along the orientation to restore more precise sub-pixel information. This method enters the amd driver and is automatically enabled when adaptive AA is enabled.
Adaptive edge AA proposed the research direction of replacing points with lines, but the isoline computing workload is large after all, and rendering performance decreases significantly after it is enabled. Morphological antialiasing made another effort in this direction. It does not calculate isoline, but classifies edge into several specific shapes, such as Z, U, and l, while Z and u can be decomposed into L.
Finally, draw a triangle based on L to determine the mixed area. This saves all the heavy computing and increases the AA speed. In AMD's newer drivers, MLAA replaces direally ally edge AA as the preferred choice for adaptive AA. The MLAA framework derives multiple different methods:
Intel's morphological antialiasing article on hpg09 is implemented on the CPU. The Code uses a very deep branch to determine the edge shape. It is completely optimized for the CPU and not suitable for GPU hardware or real-time rendering.
Siggraph 2010 poster's practical morphological antialiasing on the GPU, uses the SAT to determine the edge shape, and requires log (width) + Log (height) Pass. After determining L, You need to query a pre-calculated 512 × 512 r32f texture. Each Texel corresponds to the mixed area of a specific L. That is, the side length is 512 pixels at maximum. It can be seen that this method is very violent, although it may be faster than reading back the CPU, but the overhead is still very large.
In the article Practical Morphological Anti-Aliasing in GPU Pro 2, a more Practical gpu mlaa method named Jimenez's MLAA for distinguishing. In MLAA, Z and U do not need to be decomposed into simpler L, and a pre-computed table is directly used for query. Each pixel searches for the desired pixels in the search table based on its position in the shape. In 360 P resolution, this method can reach 3.79 ms on Xbox 9800 and 0.44 ms on Geforce GTX +. 8 x MSAA requires 5 ms under the same conditions.
NVIDIA provides a method called Fast Approximate Anti-Aliasing in Graphics SDK 11. This method is very similar to MLAA, but only recognizes long edges rather than shapes. With the long side, the sub-pixel coverage rate in each pixel can be estimated based on the intersection of the side and pixel, and AA is mixed. Later, Timothy Lottes developed the fxaa ii, which reduced the quality and improved the speed. On the Xbox 360, the 2.0 P resolution can be ms.
Another AA method published on GDC11 is called Directionally Localized Anti-Aliasing (Jimenez's MLAA and FXAA described earlier are also made public during GDC11, which is also made public ). This method is an alternative. It performs horizontal edge detection on the image after vertical blur, and then blend the result to get the image after AA.
The left is before AA, and the right is after DLAA
DLAA is still relatively fast. on Xbox 360, 2.2 p requires ms.
This article introduces several AA methods based on post process. The next article will discuss how to combine hardware AA and post process AA.
The previous article introduced several post process-based AA methods. Is it possible to combine post process AA with hardware AA? This article is about hybrid AA.
First of all, we can see from the comparison figure below for MSAA computing waste:
Edge to be calculated by MSAA
Calculate the edge of AA
With this comparison, we should have an intuitive experience. MSAA actually wasted a lot of computing workload on the pixels that actually do not need AA. If the number of samples is high, the waste will be more serious.
All the post process-based methods mentioned in the previous article are actually doing one thing: trying to use pixel information to estimate the sub-pixel-level ry and then doing AA. Edge AA is estimated by independent points. MLAA is estimated by the lform, while FXAA and DLAA are estimated by line segments. The method of Hybrid AA is to express why we need to "Estimate", instead of simply saving the sub-pixel ry first?
Subpixel Reconstruction Anti-Aliasing is a new approach published by NVIDIA researchers at I3D2011. It places the fact that the shading variation frequency is generally lower than the ry variation frequency, so it can be shading at a lower resolution, and restoring the ry with a higher resolution. The basic process of SRAA is to render a high-resolution (or G-Buffer with MSAA) in the Deferred Shading framework, however, shading is only performed in normal resolution (or without MSAA. The accumulated Results Use G-Buffer to reconstruct sub-pixel information for AA calculation similar to MLAA. This method combines MSAA and MLAA, but it can be achieved with a higher number of samples without increasing the calculation workload of shading. SRAA can only be used in the Deferred framework due to its principles (but this is not a big problem for modern Games ).
Geometric Post-process Anti-Aliasing is an independent AA method proposed by Humus. The basic idea is to render the ry here in the box mode. At this time, we can get the coverage rate of each triangle in each pixel:
Through this coverage rate, it is easy to calculate AA, and the results are compared as follows:
The cost of this method is that it is rendered over a box, but it can be used in two frameworks: Forward and Deferred. For the cup, Humus twitter said that this method was actually patented by someone else in 1996.
Intel mentioned a very simple and violent AA method on course Deferred Rendering for Current and Future Rendering Pipelines in SIGGRAPH 2010, where per-sample computing in edge, in the non-edge area, perform per-pixel computing. Like the post process-based method, this requires an edge detection to be executed and marked in stencer, and then the calculation can be done separately. The red line is marked with the detected edge:
The result of this method is the same as that of SSAA, and there is no problem with repeated MSAA calculation.
Finally, we will introduce a very powerful but simple anti-sawtooth algorithm ......
The five-point plum blossom arrangement method also requires sampling, but it only needs to create an equal-size memory copy, reducing the space complexity. How can we sample the five-point plum blossom arrangement method? Let's see the following two figures:
That's all. In the original scenario, 3D rendering is performed as usual. After completion, frames are projected to the buffer, and an additional copy is projected to the copy memory. After one of the images is translated along the horizontal and vertical axes of the lens, the sampling calculation is started and the frame image is refreshed pixel by pixel. The anti-sawtooth effect of this algorithm is almost four times the resolution anti-sawtooth operation.
Hybrid AA combines the advantages (or disadvantages) of hardware AA and post process AA ). The advantage is that the same effect can be achieved with the memory overhead and computing workload lower than the hardware AA. The disadvantage is that you need to modify the original graphic rendering pipeline.
This series summarizes three types of space AA methods, which are helpful to you.