Go Shadow Map & Shadow Volume

Source: Internet
Author: User

Transferred from: http://blog.csdn.net/hippig/article/details/7858574

The term shadow volume is almost an object that the FPS player and graphics enthusiast talk about with the release of DOOM3. Although the game is not yet available, we still have reason to think it will be one of the hottest FPS games of the 2004, thanks to John Carmack's legendary experience and some surprising previews of DOOM3 's release. id Software has never been stingy in order to achieve the best image effect and use the most advanced rendering technology, which once made the player to play its development of the game and had to cut out the money inside the pocket to upgrade the computer, do not know this time we can be spared?

Since the release of DX9, everyone's attention seems to be shader attracted, BBS inside the topic is always inseparable from shader based rendering, a period of time on the GPU internal accuracy of the discussion of the sense of blotting out, But in fact, compared with the glittering metal spheres and the water surface of the shimmering scales, a few simple shadows often bring more realism to the scene. Maybe that's why
One of the reasons DOOM3 can stand out in a plethora of FPS games.

There are many ways to implement the shadow, and now the most popular is shadow Mapping and shadow volume. The former is relatively simple to achieve, can play the current GPU programmable pipeline capabilities, but due to congenital, shadow mapping in the processing of dynamic light source/object when the cost is too large, often as a static scene in a cheap alternative. and Shadow volume's strength is precisely Shadow mapping shortcomings, such as DOOM3 the use of dynamic light source, and to the moment all the objects in motion projection shadow, Shadow volume is the only choice at this stage.

The principle of Shadow mapping:

The basic principle of the shadow mapping algorithm is that an object is in the shadow because there is a shelter between it and the light source, or the distance of the shelter from the light source is closer than the object.

Pass1: To render the entire scene under a light source, or in the light source coordinate system, in order to obtain a depth map of all objects relative to the light source (that is, what we call the shadow map), that is, the value of each pixel in this image represents the closest of the scene to the light source. The depth value of the fragment. Since we are interested in the depth value of the pixel in this pass, we can turn off all the light calculations and open the render state of Z-test and Z-write.

Pass2: Restores the viewpoint to its original normal position, renders the entire scene, calculates its distance from the light source for each pixel, and compares this value to the corresponding value in the depth map to determine if the pixel is in shadow. Then, according to the results of the comparison, the shadowed fragment and lightedfragment are respectively given different illumination calculations, so that the effect of shading can be obtained.

From the above analysis can be seen, depth map rendering only and the position of the light source and the position of the object in the scene, no matter how the viewpoint movement, as long as the light source and the object of the mutual position of the relationship is constant, shadow map can be reused, so for the absence of dynamic light source of the scene, shadow Mapping is a wise choice.

In addition to the limitations mentioned above that do not cope well with the dynamic light scene, Shadow mapping also has all the uses texture
The scene faces a common problem-sawtooth. According to the sampling theorem, only the texture resolution is less than or equal to the actual resolution of the object will not be distorted, and when a large texture is affixed to the size of the object smaller than it, there will be a fragment cover multiple Texel case, then to accurately reproduce the fragment color information, It is necessary to consider all the effects of the texel which are covered by it, which is the basic principle of various texture filtering methods. However, since depth map is constantly changing, it is not possible to put each mip-map into memory in advance as a normal texture. There is a method of using pixel shader to do bilinear filtering for depth map, but it is very expensive and does not have practical significance at this stage.

The same problem persists when the texture resolution is smaller than the screen resolution, when multiple fragment are projected onto the same texel, although there is no distortion from the point of view of the texture, but because multiple fragment share the same texture value, the sawtooth problem still exists. To make things worse, there is no filtering technique that can fundamentally solve this kind of aliasing, because mathematically speaking, it is impossible to create more information than the original amount by arithmetic. In recent years, a lot of efforts have been made to solve shadow mapping's sawtooth problem, and the prospects are adaptive shadowmap (ASM) and Perspective Shadow Map (PSM). The basic principle of both is to artificially increase the sampling rate where possible to produce aliasing, so that a fragment corresponds to at least one texel, the difference is that ASM increases the sampling rate at the shadow edge, while the PSM is close to the viewpoint. Tinkering with a flawed approach is mathematically a lack of aesthetics, as John Carmack in an email in August 2002:

"Shadow buffers Make
Good looking demos with controlled circumstances, if you start
Using them for a "real" application, your find that need absolutely
Massive resolution to get acceptable results for omni-directional
Lights, and a lot of the the artifacts need to being tweaked on a per-light
Basis. While it's possible to does shadow buffers on Gf1/radeon class
Hardware, without percentage closer filtering they look wretched. If we
Were targeting only the newest hardware, shadow buffers would has a
Better shot, but even then, they has more drawbacks than is commonly
appreciated. ”

Looks like John Carmack found a better way to implement shadows? Let's see what it is.

The principle of Shadow Volume:

Shadow volume This algorithm was first presented in a paper written in Franklin C. Crow in 1977, "Shadow Algorithms for Computergraphics". The basic principle is to calculate the area (shadow volume) where shadows are generated in the scene based on the location relationship of the light source and the shelter, and then detect all objects to determine whether they will be affected by the shadow.

The green object in the figure is called a shelter, and the gray area is shadow volume.

Only objects in the shadow volume are affected by shadows.

The algorithm of Shadow volume

Now that you know the fundamentals of shadow volume, how do you determine if an object or part of an object is in shadow volume? This will use the help of stencil buffer.

Z-pass algorithm:

Z-pass is the standard algorithm that shadow volume begins with to determine whether a pixel is in shadow. The principle is:

Z-buffer write, renders the entire scene and gets a depth map of all objects. Note the depth map and shadow here.
Mapping inside the difference is shadow volume inside the depth map is the real viewpoint as a point of view, and shadow
The depth map inside the mapping is obtained from the light source.

Pass2:disable Z-buffer Write, ENABle
Stencil buffer Write, and then renders all the shadow volume. For the front of shadow volume
Face (this side of the viewpoint), if the result of depth test is pass, then the stencil value corresponding to this pixel is added one. If
The result of depth test is fail and the stencil value is unchanged. And for shadow volume's Back Face (
Away from the side of the viewpoint), if the result of depth test is fail, the stencil value is reduced by one, otherwise it remains the same.

A simple sentence to summarize the z-pass algorithm is from the point of view to the object to draw a line of sight, when this ray into the shadow volume, stencil value plus one, and when this ray left shadow volume, stencil value minus one. If the stencil value is zero, it means that the number of times the implementation enters and leaves the shadow volume is equal, which naturally means that the object is not in shadow volume.

PASS3: After the second step, the stencil value of each pixel determines whether it is in shadow (if the value of stencil is greater than 0, the pixel is inside the shadow volume, otherwise outside of shadow volume), then the shadow effect is plotted accordingly.

In this picture, the line of sight three in three out shadow volume, the last stencil value is zero, indicating that the object in the shadow volume outside, unaffected by the shadow.

In this picture, three in a line of sight, stencil value of 2, indicating that the object in the shadow volume, there is a shadow generated.

This image from the point of view to the object's line of sight is suspended in front of the shadow volume, that is, all z-test are fail, the corresponding stencil value is zero, indicating that the object is outside the shadow.

Disadvantage and remedy of z-pass algorithm

The above discussion is based on the situation outside the shadow volume of the viewpoint. When this condition can be satisfied, the Z-pass algorithm works very well, but once the viewpoint enters into the shadow volume inside, the Z-pass algorithm will fail immediately.

The image inside the view of the two in two out, according to Z-pass algorithm, the last stencil value of 0, indicating that the object is outside the shadow, can actually be in the shadow of the object. The reason for the error is that the point of view into the shadow, so that the sight lost a chance to enter the shadow volume, so that the original should be 1 stencil value into 0.

Z-pass This kind of wrong behavior can be seen from:

Watch the shadows in the ground

Z-fail algorithm:

The Z-fail algorithm is the independent invention of John Carmack,bill Bilodeau and Mike Songy, whose aim is to solve the problem of shadow algorithm failure after the viewpoint enters volume Z-pass.

Pass1:enable Z-write/z-test, render the entire scene and get depth map. (This step is exactly the same as the Z-pass)

Pass2:disable Z-write, ENABle
Z-test/stencil-write. Render shadow volume, for its back face, if the result of Z-test is fail, stencil value plus one, if the result of Z-test is pass, the stencil value is unchanged. For front face, if the result of Z-test is fail, the stencil value is reduced by one, and if the result is pass, the stencil value is not changed.

All shadow volume in the figure are in z-pass position, so the stencil value does not change.

The viewpoint has no problem within the shadow volume, and the last stencil value is 2, indicating that the object is in the shadow.

The above Z-pass cannot handle the scene, with the Z-fail calculation can get the correct result:

Conditions for using the Z-fail algorithm

Capping for Z-fail

Because the Z-fail algorithm relies on the calculation shadow volume cannot determine the value of the stencil buffer by the z-test part, the shadow volume is required to be closed. The red solid line in the chart below represents capping, and it can be imagined that if you don't add capping artificially, the stencil value of Shadow OBJECT1/2 will be 0, whereas the correct stencil value should be 1, because they are all in the shadows.


Relationship of Z-pass and near clipping surface:

In the Z-pass algorithm, an additional capping is required to ensure that the final result is correct when the shadow volume and the view frustum have a shear relationship. Because after the clipping of the view frustum, part of the shadow volume may become open, such as in the figure additional capping position, if not artificial additional part of the polygon, in rendering shadow volume when Ste Ncil buffer does not occur +1 of the operation (because there is no polygon, and naturally will not be compared with the original depth map), the final result is obviously wrong.

How to establish shadow volume?

The establishment of shadow volume is the most important part of the whole algorithm, before the GPU appears, the shadow volume is based on the CPU. With the gradual development of GPU applications, the shadow volume operation has been ported to the GPU, but the latter method needs to preprocess the geometry of the object, and the following two methods are interpreted separately:

CPU based method (based on CPU Setup):

Presumably familiar with the shadow volume friend to Silhouette edge The word will be very familiar. It represents the contour of the object from the point of view of the light source. Shadow volume is the extension of the silhouette edge to a certain distance or from infinity. Silhouette Edge
There are many ways of determining the method, and the basic idea is to find the edges that are shared by the two triangles (relative to the light source) that are facing the opposite (a light source), because only such edges will eventually become silhouette edge, The other edges in the light source appear to be inside the object projection rather than the edge.


This figure is a polygon composed of 4 triangles, assuming that the light source is in the reader's head position, then the outer circle of the solid line is called Silhouette edge. All we have to do is remove the internal excess 4 edges (dashed lines) from the original data. The concrete implementation is this:

# Iterate through all the triangles of the model

# Calculates DOT3 (Light_direction, Triangle_normal). Use this result to determine whether the triangle is facing the light source (dot3>0) or the back light source (dot<0).

# for a light-oriented triangle, all three edges are pressed into a stack, compared to the edges inside, and if duplicates (Edge1 and Edge2) are found, delete the edges

#检测过所有三角形的 all sides, the remaining edge of the stack is the silhouette edge below the current light source/object position.

# Depending on the direction of the light source, use the CPU or vertex shader to project these silhouette edge to form shadow volume.

It is worth mentioning that this method is the DOOM3 of the scheme, but one of the problems is that silhouette edge is determined by the light source and the object's mutual position, that is, there is a change in the position between the two, Silhouette edge will be recalculated, The updated data will also be passed back to the video card to render the shadow volume, which is not a small test of the CPU's computational power and the AGP bandwidth.

GPU based method (based on GPU building methods):

Vertex shader, people are thinking about whether they can use it to speed up the rendering of shadow volume. But even now the state-of-the-art vertex shader 3.0 does not have the ability to create new geometric objects. Simply put, vertex shader can only accept a vertex, modify the vertex's properties (position, color, texture coordinates, etc), then output the vertex to the rasterized part, and then perform the pixel shader operation. Where new vertices need to be created, only the CPU directly operates vertex buffer.

Another way is to leave the space shadow volume need in advance, and then through the vertex shader operation to make it look like we need. It's like I'm going to store a bunch of data, but I'm not sure how big the size is, but I have to allocate a large area beforehand, which can be a huge waste, but it's not.


Since each edge of the object may become silhouette edge, we need to insert the degenerate quad (the red triangle) beforehand, the area of these quad is zero, it is not visible without any transformation, and will not cause visual defects. But where needed, these quad can be stretched into the shadow volume side walls.

Obviously, inserting redundant vertices creates a huge waste. Because most of the edges will not eventually become silhouette edge, which means that the inserted degenerate quad is useless. However, the advantage of this is that the geometry data only needs to be transmitted to the video card once, then regardless of the location of the light source, the pre-processing geometry can be used to generate shadow volume, unlike the method just explained, once the light source and the object relative position changes, you need to re-use the CPU calculation Silhouette Edge, then transfer the results to the graphics card.

When actually programming, you can do some improvement, because the flat surface does not produce the shadow, so there is no need to insert degenerate quad on the edges contained in these surfaces. And all the preprocessing should be done in the software development process, the user started the program directly after the insertion of the quad model, do not need to calculate the CPU.

Build/render the shader code for Shadow Volume:

C0:light position in Object space

C1:1, 1, 1, 0

C2-c5:light * View * Proj = Lightclip

C6-c9:worldinvlight Matrix

C10:color for exposing the shadow volume


mov oD0, C10//output specific colors make shadow volume visible

Sub R1, V0, C0//light source direction

m4x4 R4, V0, c[6]//transform vertex to light coordinate system

NRM R1, r1//light vector normalization, this is to shadow the various sides of the volume as long

MOV R10, C1

DP3 R10.W, v1, r1//dp3 vertex normal vector and light source vector, determining the orientation of vertices

SLT R10, C1.W, R10//fourth unit of DP3 Register based on R10 results

Mul R4, R4, R10//Set the W bit of R4

m4x4 R5, R4, c[2]//output vertex to clip space

MOV oPos, R5

Algorithm optimization of Shadow volume (I.)

The basic algorithm of Shadow volume is basically done here, here are some of the more commonly used optimization algorithms.

(a) Z-pass. Vs. Z-fail

As mentioned earlier, Z-pass is faster than z-fail, so we can use Z-pass to improve performance in situations where there are no problems, but how do you determine when z-pass won't cause problems? Z-pass failure is mainly due to two reasons:

Reason one: The viewpoint enters shadow volume inside, for example:

As long as these two situations can be detected, you can switch to the Z-fail algorithm when needed. The decision of condition A can be referenced, make a line between the viewpoint and the light source, if this lines and the shelter intersect, then can affirm the viewpoint in shadow volume, will switch to Z-fail algorithm.

Reason two: Shadow volume intersect with near clipping surface

As for case B, the Light-pyramid (the Red shaded part) formed by the light source and the near-cut surface can be used to the intersection of the shielding object. If the mask is completely outside the Light-pyramid, the shadow volume generated by it will not intersect the near clipping surface, and the z-pass algorithm can be used, otherwise only the z-fail algorithm will be used.

Algorithm optimization of Shadow volume (II.)

(ii) Tricks to save fillrate:

As mentioned earlier, the two most time-consuming steps in the shadow volume algorithm are silhouette edge determination and shadow volume rendering. The shadow volume rendering is a complete test of the GPU fill rate, although now the graphics card can easily have a dozens of G fragment/s fill rate capability, but encountered complex scenes, the pipeline will inevitably overwhelmed. In addition, the frequent stencil buffer operation will also occupy a portion of the memory bandwidth, if you can find some ways to minimize the size of shadow volume, it will be an effective optimization method:

Limit the range of illumination (attenuated light Bounds):

If the light source used has a attenuation effect, the scissor test can be used to limit the scope of the rendering to the scope of the light source, because beyond this range there will be no shadow, and naturally no need to render that part of the shadow volume. The so-called scissor test is artificially defined in the screen coordinate system below a rectangle, only the coordinates in this rectangular range of fragment to pass the test, its contents can be written to the frame cache.

Nvidia's Shadow Acceleration technology (Ultra Shadow):

The Ultra shadow technology is surfaced with the release of NV35, which has been inherited in Nv36/38, and we can basically use this technology in NVIDIA's future products.

id Software programmer John Carmack once said that NV35 is a tailor-made GPU for DOOM3, and we have reason to suspect that Carmack said this is probably because NV35 is integrated with Ultra Shadow Shadow Plus Speed Technology (recently the GEFORCEFX series has become DOOM3 's recommended GPU), so what exactly is Ultra shadow, and how does it accelerate the rendering speed of shadows?

In fact, ultra Shadow technology uses only one NVIDIA recently submitted OpenGL extension--ext_depth_bounds_test, let's take a look at the NVIDIA official introduction to this extension on GDC2003:

First notice the name of the problem, GDC2003 in March, then this extension is only NVIDIA exclusive things, in April this extension renamed to Ext_depth_bounds_test. The extension at the beginning of Ext indicates that a number of vendors are developing this technology, and perhaps soon we will see the Ultra shadow implemented on the ATI GPU.

The role of Depth bounds test is to compare the Z-values in yw buffer specified by the current fragment screen coordinates (XW, Depth) with the user through GLDEPTHBOUNDSNV (glclampd zmin, GLCLAMPD Zmax) [Zmin,zmax], if the z-value is outside the secondary range, the current fragment is removed from the assembly line, and the stencil buffer operation is not performed here. Note that this is not the z-value of Fragment (shadow volume), but the z-value of the shadow receiver that has been rendered in the previous path. For specific information, please see:

As you can see, because the Z-value of point A is outside the [Zmin,zmax] range, this point is not likely to be obscured by shadows, so the fragment at the a1/a2 point can be discarded. The z-value of point B is outside [Zmin,zmax], so the fragment at B1 Point must be stencil buffer operation.

(See the detailed technical introduction: "Nvidia's Revenge program GF FX 5900 Ultra")

Prospect of shadow rendering implementation technology

Shadow volume is a relatively good technology to achieve a unified illumination model in the near stage, the main problem is that CPU-based approach to processor dependence is heavier, in the case of more ai/physical operations, the CPU may not be enough, and the GPU-based method is too inefficient to produce a large number of redundant vertices Because the current GPU (including the upcoming nv40/r420) does not have the ability to generate new vertices inside the chip. Microsoft is aware of this and has been listed as one of the goals to be achieved in DirectX Next's development plan:

In the longer term, lighting models based on real physical models (such as spherical harmonic lighting, ray-tracing, radiosity) are the direction of development, when there is no need to design separate algorithms to achieve shadows, all illumination/ Shadow effects are all wrapped up in a unified lighting model, and any effect is natural, just as they do in the real world. Of course, all of these assumptions are based on the support of semiconductor production technology, and we will not see their hardware implementations in the near future (5-10 years).

Go Shadow Map & Shadow Volume

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.