By Jos é Mar ía mé ndez
Link: http://www.gamedev.net/reference/programming/features/simpleSSAO/
Introduction
Global Illumination (GI) is a term in computer graphics that refers to the illumination of all surfaces that interact with each other (the light jumps back and forth, returns, or is blocked). For example: color bleeding, caustics, and shadow. in many cases, the GI term represents only ambient lighting ).
Direct lighting-light directly comes from the light source-it is very easy for today's hardware to calculate, but this is not true for GI, because we need to collect information about the adjacent surfaces of each face in the scenario, this complexity will soon be out of control. however, there are some easy-to-control GI approximate simulation methods. when the light spreads and beats in the scene, there are some areas that are not easily visible: close gaps between corners and objects, seams, and so on. this makes these regions look darker than they are.
This phenomenon is called ambient occlusion (AO). It is generally used to simulate the dimming of this area by testing how much it is blocked by other surfaces for each surface. such computation is much faster than global illumination, but most existing AO algorithms cannot run in real time.
Real-time ao has been regarded as an unfulfilled goal before the appearance of screen space ambient occlusion (ssao. its first application was in crytek's "crysis" game, and many other games later used this technology. in this article, I will explain a simple and clear method, but the effect is better than the traditional ssao method.
Ssao in crysis
Preparations
The initial implementation of crytek was to use a depth buffer as the input and roughly perform the following operations: For pixels in each depth buffer, sample some points in the surrounding 3D space, project back to the screen space and compare the depth values at the same position in the sampling point and the depth buffer to determine whether the sampling point is in front (not blocked) or after the surface (encountering a blocked body ). in this way, after sampling the depth buffer, the distance between the average occlusion body is obtained, and a buffer is obtained. however, this method has some problems (such as self-occlusion and HALO). I will explain it later.
All calculations of the algorithm I described here are in 2D space and do not require projection transformation. it uses the position of each pixel and the normal buffer, so if you have used delayed rendering, half of the work has been completed. if not, you can reconstruct the location information from the depth buffer, or directly Save the position of each pixel to the floating point buffer. if you implement ssao for the first time, I suggest the latter, because here I will not explain how to reconstruct the location information from the depth buffer. either way, in the following article, I will assume that you already have these two buffers available. in addition, the position and normal must be in the view space.
The next thing we need to do is to use the location and normal buffer to generate a block buffer with each pixel corresponding to a component. the decision on how to use the blocking information is yours. The usual method is to subtract it from the Environment lighting of the scenario, but if you want it, it can also be used to make some non-real information, non-photorealistic) Rendering effect.
Algorithm
For any pixel in a scenario, we can calculate its environmental hiding as this: we regard all the surrounding pixels as small balls and calculate the sum of their contribution. for the sake of simplicity, we regard all the small balls as points: the occlusion is only a point that has no orientation, so the occlusion (accepting blocks of pixels) is just a <position, normal> pair.
Therefore, the contribution of each occlusion depends on two factors:
- The distance to the blocked person "D ".
- The angle between the normal "N" of the masked object and the vector "v" between the blocked object and the blocked object.
With these two factors, a simple formula for calculating the occlusion is provided:
Occlusion = max (0.0, dot (n, V) * (1.0/(1.0 + D ))
The first max (0.0, dot (n, v), intuitively, is a point located on the top of the blocked person, with a greater contribution than other points. the second option is linear attenuation by distance. Of course, you can also choose to use square decay or other attenuation functions, but I prefer it.
This algorithm is very simple: samples some neighboring points around the current pixel, and uses the formula above to calculate the contribution of shading. in order to collect the shading, I used four sampling times (<>, <->, <0,-1>) When 45o and 90o were rotated ), and uses a random normal texture as an image.
Some tips can accelerate computation: for example, using a half-size location and normal cache, if you want, you can also apply a bidirectional blur to the final ssao cache to reduce the noise produced by sampling. note that these two tips can be applied to any ssao algorithm.
The following is the HLSL pixel shader Code applied to the screen rectangle:
Sampler g_buffer_norm; <br/> sampler g_buffer_pos; <br/> sampler g_random; <br/> float random_size; <br/> float g_sample_rad; <br/> float g_intensity; <br/> float g_scale; <br/> float g_bias; <br/> struct ps_input <br/>{< br/> float2 UV: texcoord0; <br/> }; <br/> struct ps_output <br/>{< br/> float4 color: color0; <br/>}; <br/> float3 getposition (in float2 UV) <br/>{< br/> return tex2d (g_buffer_pos, UV ). XYZ; <br/>}< br/> float3 getnormal (in float2 UV) <br/> {<br/> return normalize (tex2d (g_buffer_norm, UV ). XYZ * 2.0f-1.0f); <br/>}< br/> float2 getrandom (in float2 UV) <br/>{< br/> return normalize (tex2d (g_random, g_screen_size * UV/random_size ). XY * 2.0f-1.0f); <br/>}< br/> float doambientocclusion (in float2 tcoord, in float2 UV, in float3 P, in float3 cnorm) <br/>{< br/> float3 diff = getposition (tcoord + UV)-P; <br/> const float3 v = normalize (diff ); <br/> const float d = length (diff) * g_scale; <br/> return max (0.0, dot (cnorm, V)-g_bias) * (1.0/(1.0 + D) * g_intensity; <br/>}< br/> ps_output main (ps_input I) <br/>{< br/> ps_output o = (ps_output) 0; </P> <p> O. color. RGB = 1.0f; <br/> const float2 VEC [4] = {float2 (), float2 (-), <br/> float2 (), float2 (0, -1) }; <br/> float3 P = getposition (I. UV); <br/> float3 n = getnormal (I. UV); <br/> float2 Rand = getrandom (I. UV); <br/> float Ao = 0.0f; <br/> float rad = g_sample_rad/P. z; <br/> // ** ssao calculation ** // <br/> int iterations = 4; <br/> for (Int J = 0; j <iterations; + + J) <br/>{< br/> float2 coord1 = reflect (VEC [J], RAND) * rad; <br/> float2 coord2 = float2 (coord1.x * 0.707-coord1.y * 0.707, coord1.x * 0.707 + coord1.y * 0.707); </P> <p> AO + = doambientocclusion (I. UV, coord1 * 0.25, P, n); <br/> AO + = doambientocclusion (I. UV, coord2 * 0.5, P, n); <br/> AO + = doambientocclusion (I. UV, coord1 * 0.75, P, n); <br/> AO + = doambientocclusion (I. UV, coord2, P, n); <br/>}< br/> AO/= (float) iterations * 4.0; <br/> // ** end ** // <br/> // do stuff here with your occlusion value "AO": modulate ambient lighting, write it to a buffer for later // use, etc. <br/> return O; <br/>}
The screen space solution is very similar to "hardware accelerated ambient occlusion techniques on GPUs" [1], mainly because the sampling mode is different from the AO function. it can also be understood as the image space version of "dynamic ambient occlusion and Indirect lighting" [2.
Some details worth mentioning in the Code:
- Radius divided by P. z, scaled according to the distance to the camera. if you ignore this division, all pixels on the screen will use the same sampling radius, and the output result will lose the attention.
- In a for loop, coord1 is the original sampling coordinate of 90o, coord2 is the same coordinate, but it only rotates 45o.
- The random texture contains a random normal vector, so this is your average line texture. below is the random normal texture I used:
It is tiled to the entire screen and sampled by each pixel using the following texture coordinates:
G_screen_size * UV/random_size
"G_screen_size" contains the width and height of the screen (in pixels), and "random_size" is the random texture size (I use 64x64 ). the sampled normal is used to mirror the sampling vector in the for loop to obtain different sampling modes for each screen pixel. (For details, see "interleaved sampling" in the references ")
At last, shader is reduced to just traverse a few blocks, call our AO function for them, and accumulate the final result. There are 4 artist variables in total:
- G_scale: the distance between the occlusion and the blocked person.
- G_bias: controls the width of the occlusion cone to be masked.
- G_sample_rad: Sampling radius.
- G_intensity: Ao intensity.
When you adjust them and observe changes in the effect, you can achieve the desired effect intuitively.
Result
A) Direct output, 1 pass16 sampling B) Direct output, 1 pass8 sampling c) Only Direct Illumination d) Direct Illumination-ao, 2 pass sampling every pass16 times.
As you can see, the code is short and simple, and the result is not self-masked. It only has a weak halo. these two phenomena are also the main problems of using the deep buffer as the input algorithm. You can see from the following picture:
Self-occlusion occurs because the traditional algorithm samples the sphere around each pixel, so at least half of the samples on the plane that is not blocked are marked as "blocked ". as a result, the overall shading effect is grayed out. halo is the white soft edge around the object, because the self-hiding of these areas does not work. it is recognized to avoid self-hiding while also reducing the halo problem.
This method will produce amazing results when you move the camera. if your pursuit of results is higher than the speed, you can use two or more pass with different radius (copy the for loop in the Code), one for collecting more global AO, others are used to eliminate small cracks. after illumination or texture is applied, the flaws produced by sampling are almost invisible. This is also the reason why, generally, you do not need to blur the pass.
Advanced
I have described a simple and practical ssao implementation that is very suitable for games. however, if we can take the face deviation from the camera into consideration, we can get a better quality. generally, three buffers are required: Two-position/depth buffers and one normal buffer.
However, you can also use two buffers to save the depth of the front and back in a buffer's red-green component, and then recreate the position from each component. in this way, you can use the first buffer to save the "location" and the second buffer to save the normal.
The following is the result of 16 buffer samples at each position:
Left:Front masking, Right: Back masking
To implement this function, you only need to call "doambientocclusion ()" in the loop to sample the position buffer on the back. obviously, the contribution of the back is very small, but the number of samples is doubled, and the rendering time is almost doubled. although you can reduce the back sampling, this is still not practical.
Here is the additional code to be added:
Add the following call to the loop:
AO + = doambientocclusionback (I. UV, coord1 * (0.25 + 0.125), P, n); <br/> AO + = doambientocclusionback (I. UV, coord2 * (0.5 + 0.125), P, n); <br/> AO + = doambientocclusionback (I. UV, coord1 * (0.75 + 0.125), P, n); <br/> AO + = doambientocclusionback (I. UV, coord2 * 1.125, P, N );
Add the two functions to the shader:
Float3 getpositionback (in float2 UV) <br/>{< br/> return tex2d (g_buffer_posb, UV ). XYZ; <br/>}< br/> float doambientocclusionback (in float2 tcoord, in float2 UV, in float3 P, in float3 cnorm) <br/>{< br/> float3 diff = getpositionback (tcoord + UV)-P; <br/> const float3 v = normalize (diff ); <br/> const float d = length (diff) * g_scale; <br/> return max (0.0, dot (cnorm, V)-g_bias) * (1.0/(1.0 + D); <br/>}
Add a sampler "g_buffer_posb" that saves the back position. (enable the front-end Rendering scenario to generate it)
Another change we can make (this time we improved the speed rather than the effect) is to add a simple level of detail system to our shader. change the fixed number of samples to the following:
Int iterations = lerp (6.0, 2.0, P. Z/g_far_clip );
The variable "g_far_clip" is the distance of the remote cropping plane. The shader must be input as a parameter. currently, the number of iterations of each pixel application depends on the distance to the camera. Therefore, only rough sampling is performed for distant pixels, which improves the efficiency with a low quality. however, I did not use this technique in the following performance measurements.
Summary and performance measurement
As I mentioned at the beginning of this article, this method is very suitable for games with delayed lighting, because it requires two buffers. its implementation is very direct, and the quality is good. It solves the self-occlusion problem and weakens the Halo Phenomenon. however, it also shares the same defects with other ssao technologies:
Disadvantages:
- The hidden ry is not taken into account (especially outside the cone ).
- The performance is largely determined by the sampling radius and the distance to the camera, because the radius of an object near the near-cropping surface is large from a distance.
- Output noise.
To balance the speed, we can perform a 4x4 Gaussian Blur on the implementation of 16 samples, because only one texture is sampled each time, and the AO function is very simple, but it is still a little slow in actual application. here is a table showing the speed of the scenario 900x650 that contains the Hebe model without blurring dia8800gt:
Set |
FPS |
Ssao (MS) |
High (32 positive/reverse sampling) |
150 |
3.3 |
Medium (16 positive samples) |
290 |
0.27 |
Low (8 positive sampling) |
310 |
0.08 |
In the end, you can see the effects of different models in this algorithm. maximum quality (32 Positive and Negative sampling, large radius, 3x3 bidirectional fuzzy ):
Minimum quality (8 positive sampling times, no blur, small radius ):
It is also useful to compare this technology with the tracing AO. The purpose of comparison is to see how many samples this technology can approximate the actual AO.
Left: ssaoPer pixel48Subsampling(32 Front and 16 back), no blur. Right: Mental RayInRay Tracing ao. 32 sampling times, spread = 2.0, maxdistance = 1.0; falloff = 1.0.
Last suggestion: do not insert the shader into your pipeline to automatically achieve realistic results. although this implementation has a good performance/quality ratio, ssao is a very time-consuming effect and you need to carefully adjust it to achieve the highest possible performance. such as addition and subtraction of sampling times, adding a bidirectional blur, changing the intensity, and so on. in addition, you need to consider whether ssao is suitable for you. unless you have a lot of dynamic objects in your scenario, you don't need ssao at all. Maybe light map is enough for you and can provide better quality for static scenarios.
I hope you will benefit from this article. All the Code contained in this article follows the MIT license.
About the author
Jos é Mar ía mé ndez is a 23-year-old computer engineering student. he has been playing games for six years in his spare time and is now the chief programmer at a startup minimal drama game studio.
References
[1] hardware accelerated ambient occlusion techniques on GPUs
(Perumaal Shanmugam)
[2] dynamic ambient occlusion and Indirect lighting
(Michael Bunnell)
[3] image-based proxy accumulation for real-time soft Global Illumination
(Peter-Pike Sloan, Naga K. govindaraju, Derek nowrouzezahrai, John Snyder)
[4] interleaved sampling
(Alexander keller, Wolfgang heidrich)
Rendering of cipher za of crytek in 175 x768, FPS,There is a direction light.
1024x768 in the same scenario, 110 FPS, use ssao intermediate settings: 16 sampling times, Front (Front faces), no blur. the ambient light has been multiplied (1.0-ao ).
Sponza can be downloaded from the crytek website.