I have always thought that post-processing technology will be reused by more and more games. The release of Starcraft 2 has proved this to some extent and moved the light and shadows to post-processing, this maximizes the flexibility of scene rendering, and of course brings many insurmountable problems. Translate this article  Article The main purpose is to familiarize yourself with these technologies and leave some notes. By the way, I would like to appreciate the technical charm of star 2 :). (The English narration in this article makes me vomit. If I translate it based on the original ideas, it will only make readers vomit. Forgive me for changing the original narration order, for more information about preference, see the original English version) 
 
 
 
 
At the earliest, we decided to let the GPU compute more load image performance computing, and let the CPU free up, the main reason is that, in Starcraft 2, players can generate and manipulate a large number of Game objects. In this way, more CPU resources are required to compute massive AI resources and manage huge system resources. The main GPU load comes from the pixel paintors, not the vertex paintors, because our engine uses as few times and number of vertices as possible, in addition, the modern GPU architecture is simply a piece of cake for rendering millions of vertices at a time. In contrast, because my game uses delayed coloring, the pixel shader bears almost all the artistic effects. The advantage of delayed coloring is that the re-painting rate of pixels remains at a very low level, and the load of the pixel coloring tool remains constant without the dramatic impact of the complexity of the scenario.
 
 
Our engine needs to deal with two completely different game scenarios, one being a normal strategic model, and the other being a plot model. In the strategic mode, cameras are usually high above each other and look down on the whole. Players are mainly concerned with the overall strategy, rather than the details of the game, most game objects only cover a small number of pixels on the screen. Therefore, you can use only a small number of vertices to achieve the desired screen effect. The story model is different. players need to observe the game world in the first-person mode, so that they can be more immersive and pay more attention to the details of the scenario, in this way, we need more refined art resources to enhance the player experience. Technically, these two models have completely different properties. For example, rendering a strategic model requires many times to complete a screen (there are too many game objects to be seen ), however, each single game object should not be very precise. In the plot mode, the scene requires a small number of times to be drawn, but every game object must be carefully rendered, because players can observe them at close distance. Next we will explain how to implement these requirements technically.
 
 
 
Screen-based Special Effects
 
 
 
An important goal of the design of the starcraft 2 graphics engine is to reflect a complex lighting environment in the plot mode. Now let's think about Starcraft 3, an object can only be affected by its limited light at any time, which leads to a significant mutation in the light effect of the object when switching to the light of the object, therefore, the dynamic light source is rarely used in Warcraft 3. In addition, the traditional 3D rendering method will greatly increase the number of scenes to be drawn. Imagine that there is a group of Star ships in the scenario, and each star ship has a lot of flashing lights, these lights will affect the starship itself and other ships around it. In this case, the starship has to render a ship, because each starship has different lighting environments, this results in very low GPU efficiency. Because our multipart terrain is a complex method of mixing multi-layer landform, the traditional rendering method will also bring serious problems to the rendering of the terrain system.
 
 
The best solution to the above problems is deferred shading. The principle is to first grating all objects into pixels, then, the depth, normal, and color of each pixel are buffered, waiting for subsequent pixel illumination coloring. You may say that the advantages of this technology are hard to say. After all, it also brings many negative problems. For example, buffering so much data consumes a lot of memory and bandwidth, in addition, we need to add many buffer sampling times. Yes, you are right about it, however, the biggest benefit this technology brings us is to change the graphic computing workload and scenario complexity from the original exponential relationship to a linear relationship, no matter what kind of post-effects we add to the screen, the consumption is fixed and has nothing to do with the complexity of the scenario. No matter whether there are 500 or 5 objects in the scenario, the burden on special effect processing is similar, because all special effects are targeted at pixels rather than objects.
 
 
 
Maintaining a low number of plotting times is very important in the RTS game, so we hope to buffer as much information as possible into the video memory in one drawing. Most hardware supports 4 MRT numbers, the number of each RT channel is 4, so we have 16 channels to store the information we need. We use these two channels as follows:
1. Color components that are not affected by the light, such as self-illumination, Environment pasters, and illumination pasters (Environment lighting colors ).
2. Deep Information
3. Pixel normal
4. If static environment occlusion is used, the Environmental occlusion value of each pixel is saved. If the screen space is used for environmental occlusion, the pre-generated environment will be ignored.
5. Diffuse Materials
6. Mirror reflection Material
 
 
 
Note that the size and depth of mrts in DX9 must be the same. Of course, some hardware support the independent write function of MRT, we will also take advantage of these features to avoid using bandwidth for unused bits. We have used the HDR effect in interstellar 2, so that all the four buffer channels must be in the 16-bit floating point format. Of course, using high-precision format data can effectively avoid precision problems, the pixel coloring tool does not need to decode the data, but unfortunately, each buffer must use a 4*16 format, in this way, the output bandwidth of each pixel is increased to 24 bytes (the author may not consider the specular item), but we will immediately feel that this sacrifice is completely worthwhile for the flexibility it brings.
 
 
Generally, the rendered objects are non-transparent. The biggest problem of delayed coloring is transparent processing. We will discuss how to solve this problem later. Our terrain is multi-layered. The normal, diffuse reflection, and highlights of each layer are mixed to the buffer, but only the bottom layer of the terrain will write the depth value during rendering. The buffered values are used to implement various special effects, such as depth values used in illumination, Volume fog, dynamic ambient light occlusion, Intelligent Displacement, depth of field, projector, edge Detection and thickness measurement; normal is used in dynamic ambient light occlusion calculation; diffuse and highlight materials are used to calculate illumination.
 
 
 
Delay coloring
 
 
 
In Starcraft 2, only partial lighting uses delayed coloring. For example, if the lighting source and the spotlight have a certain impact scope, the traditional method is used to render the light in all scenarios first, because the whole scene light (such as sunlight) it will be exposed to all models, so using delayed coloring does not have much benefit. In fact, because the buffer needs to be sampled again, the efficiency of delayed coloring is lower in this case.
 
 
 the effects of traditional coloring and delayed coloring are the same, except that delayed coloring uses illumination as a post-processing effect, which improves the efficiency of delayed coloring in complex lighting environments, if delayed coloring is used, the light is used to calculate the affected pixels, and the 3D space coordinates of pixels can be obtained more efficiently in sm3.0, because ps3.0 adds a new register, the grating unit is used to fill the X and Y coordinates of the corresponding pixel in the buffer, so that we can quickly obtain the camera space coordinates of this pixel through the Z value of the pixel: 
 float2 vviewpos. XY = float2 (x, y) * float2 (2.0f,-2.0f)/float2 (W, h) + float2 (-1.0f, 1.0f); 
 vviewpos. ZW = float2 (1.0f, 1.0f); 
 vviewpos. XYZ = vviewpos. XYZ * fsampleddepth; 
 float3 vworldpos = MUL (p_minvviewtransform, vviewpos ). XYZ; 
 (I think the original article may be wrong. If a friend sees my mistake, please correct it immediately. Some items are as follows) 
 interpolant_vpos = float2 (x, y) 
 p_vcameranearsize = 1.0f 
 p_vreciprendertargetsize = float2 (W, h) 
 
Template test: Early-Z and early-stencel
We need a very efficient method to find out the pixels that fall within a certain lighting impact range on the screen. Naturally we think of Z test and stencel test, you can remove the pixels behind the illumination range by using stencel test, and remove the pixels before the illumination range by using Z test. The original Article does not elaborate on how to do this. Based on my understanding of deferred shading and shadow volume, it is estimated that he should do it as follows:
 
 
 
1. First, let's look at 2-0 and use three rectangles of different colors to represent three different scene objects. The yellow circle indicates the illumination range of the light. The green object is in front of the illumination range of the light source and is not affected by the light, red is within the illumination range of the Light and is affected. Objects with dark blue are not affected after the illumination range.
2. First, render all the scenes to the relevant buffer, and create Z, normal, color, and other data.
3. For all light
A: Disable color write and Z write. stencel test is always successful. Set d3drs_stencilzfail to d3dstencilop_incr, set Z to d3dcmp_greater, and draw backface with closed convex illumination. In this way, all pixels located before the back of the convex body can be marked in the template, 2-1
B: Perform illumination coloring. Set the stencel test to d3dcmp_equal, set the ref value to 1, and the z test to d3dcmp_less. Draw the frontface with closed convex illumination, at this time, early stencel and early Z start to play a role. First, early Z will remove all the pixels located before the illumination range of the light. The result is the yellow part of the 2-2 coil; early stencel removes all the pixels of the dark blue object behind it. The pixels are within the range enclosed by two or three yellow lines, leaving only the pixels in the red part of the pixel range generated in step a in the stencel buffer, the last pixel left will be officially entered into the pixel coloring device for illumination calculation.
 
 
All of the above is described when the camera is out of the illumination range.AlgorithmWhen the camera is under a certain illumination range, you only need to simply draw the backface of the surrounding body. Blizzard put a complex light source diagram as follows. We can imagine how complicated our shader and illumination range judgment would be if we were using traditional rendering methods.
 
 
 
Ambient Light shielding in ssao screen space
This article has been published a long time ago, but recently launched crysis and became popular. Its role is to make the scene light and shade more gentle (in fact, it is wrong ), this gives you the illusion of Global illumination. This is probably the case. Sample the depth value of the specified pixel point around the specified pixel in the screen according to the specific method, and then use the specific method to estimate the sampling result to obtain an occlusion value, the most important thing is how to sample and how to evaluate the sample value. However, by the way, ssao has inherent limitations because the sampling points are visible on the surface, which means that the impact of invisible surface occlusion cannot be evaluated normally, however, since AO shows low-frequency features, this defect has little impact on the final effect.
 
 
How is sampling best?
I believe that many people have implemented some AO on their own, but the final effect is not satisfactory. For example, the light and shade granules are too heavy. Even if we do blur, we cannot satisfy our desire for beauty. Now the opportunity is coming, let me see how Starcraft works. First, we must mention the calculation of AO map in 3D space. The principle is to track a number of light in the hemisphere space with the normal direction of the object surface, and calculate the weights of each tracking result, in order to get a similar effect in ssao, we use the same method to calculate the apparent space coordinates of the given pixel (as mentioned earlier), and then use this coordinate as the base point, select 8 ~ Then, the camera space coordinates of the sampling points are projected back to the screen coordinates to sample the depth buffer. Finally, the depth value of the sampling points is obtained and then calculated. The biggest problem is how to select a sampling point. If you do not select a sampling point, it will cause serious noise in the final AO image. In order to avoid defects as much as possible, Starcraft 2 adopts a random sampling method, which is a random vector, and stored in the texture. Note that the number of random vectors is not necessarily equal to the number of pixels. When generating an AO image, sample the texture in the pixel coloring tool, A random interpolation vector is obtained. I pass X (8 ~ 32) random vectors, and then use the previously obtained vector to reflect the X vectors. In this way, we get x pseudo-random vectors, the input x vectors are modeled in the range of 0.5 ~ 1, not 0 ~ 1. To prevent the sampling points from being too concentrated near the test point, and the modulus of the X vectors are scaled by a variable that can be adjusted by the artist, the artist can control the sampling range.
 
 
 
Maybe now you have a lot of questions: why do we need to input x random vectors from the outside without generating them in PS? Why do we use a pre-generated random vector for reflection of X vectors? Why does the Vector used for mirroring need to be randomly generated instead of the pixel normal? There are a lot of confusions that plague us and make us difficult to eat, but unfortunately, the original article does not explicitly explain it, and I am about to give unofficial guesses about these strange practices, if you are not satisfied, please correct me.
 
 
Why not generate a random vector directly in PS?
There are two hard requirements for Ao's random sampling. The first point, the sampling points of each pixel must be random and different from each other. Because the positions of each pixel point in 3D space are different, to be random, the sampling position of each test point must be different to minimize the finalProgramNoise, so that the particles are scattered on the screen as much as possible, instead of over-concentration. Second, the same pixel must have the same sampling point in different frames, which is obvious, otherwise, the scene will not change, but the screen will still flash. Based on the above two points, it is impossible to randomly generate vectors in PS for sampling, Or the implementation efficiency is very low and complicated.
 
 
 
Why do we need to use a pre-generated random vector for reflection of X vectors?
I think the intention is obvious. This is to satisfy the first requirement mentioned above, so that the sampling points of each pixel are random. Otherwise, each pixel uses the same sampling mode, it also produces noise aggregation. The method used by interstellar 2 is to use a random vector (extracted from the random texture) to reflect other vectors to achieve this goal. Obviously, the amount of reflection calculation is small and the random effect is good, of course, you can add, multiply, subtract, and other arbitrary operations on the input vector and the random vector, as long as they can be randomized.
 
 
 
Why does the Vector used for mirroring need to be randomly generated instead of the pixel normal?
I am not sure about this, but in fact, in the sampling, the normal is very useful, that is, to reverse the random sampling vector inside the object, because sampling must be performed in the hemisphere in the positive direction of the normal, it is absolutely wrong to capture the object. It is possible that the normal is used to disrupt the sampling vector of each pixel. the random degree is not enough.
 
 
How to evaluate the depth value obtained from sampling?
Now we know how to perform sampling. Next, let's look at the last key point and perform sampling and evaluation. According to common sense, the sampling points close to the test point obviously cover more of the test points. In fact, the occlusion is inversely proportional to the distance, but in order to make the evaluation more random, we asked the artist to control the relationship between the occlusion coefficient and distance, but in any case, this relationship has several characteristics that cannot be violated:
1. If the depth of the sample point is greater than the depth of the test point, the occlusion of the test point is 0 because the test point is behind the sample point.
2. The closer the distance, the more occlusion.
3. When the distance is large to a certain extent, the occlusion is reduced to 0.
 
 
 
The curve of the occlusion function is roughly shown in. We can make this function into a one-dimensional texture, which is used in PS.
 
 
 
We can further optimize AO map. For example, we can use Gaussian blur to further eliminate the particle sensation. However, this Gaussian Blur requires some special processing, because it is not possible to simply wipe the black white everywhere, the brightness and shade of AO map are spatial. For example, the darkness on the table cannot be blur to the person standing next to it, because they are far away from the front and back of the space. It doesn't matter, therefore, when we perform blur, the depth of the sampling point and the target point will be taken as a normal test. If the distance is too far or the two-point normal angle is too large, the weight of the sampling point will be changed to 0, finally, Gaussian fuzzy total weight is re-calculated to ensure correct normalization of the results.
 
 
 
Finally, there are some headaches to solve. Because the sampling vector is defined in the camera space, it means that when the camera approaches an object, the projection of the vector on the screen will become longer. Once the sampling point extends beyond the screen, it will be finished, we don't have any depth beyond the screen. Rendering a region larger than the display area obviously cannot solve the problem well. The simpler solution is to return a huge depth value if it is sampled outside the screen, this ensures that this sampling point will not have any impact on the test point. We can achieve this by using the bord pattern of texture sampling.
 
 
To prevent the ssao effect from being damaged when it is too close to an object, we must limit the sampling range in the ssao screen space. If the camera is too close to the object, its ssao sampling points extend too far, the ssao region consistency constraint is violated, so that the noise is not too obvious. Another solution is to determine the number of samples based on the size of the sampling area, but it will seriously affect the frame rate and give up.
 
 
 
(Ssao area consistency constraint is violated. I did not understand what was going on, so I flipped through it. I understand it like this, the ssao area should be the space area that affects the specified pixel AO. When you approach an object, the volume in the 3D space remains unchanged, but if you are very close, the projection of this area has exceeded the screen range. Now we need to violate the consistency of this ssao area. For example, we need to narrow down the sample Vector so that the sampling points can return to the screen space. MGD, Let's explain it to people who understand it)
 
 
 
Ssao Performance Analysis
The biggest performance bottleneck of ssao is sampling. Random Sampling seriously damages the continuity of the GPU Texture buffer system, and the size of the sampled texture area directly affects the Texture buffer performance, therefore, we can only use a 1/4-size depth buffer to improve performance. If the sampling area becomes larger, the dark area in the dark area will become more gentle. Now we are faced with a dilemma. Art requires strong contrast between the dark and dark areas in the scene, the flat part must have a wide range of soft sampling effects. To meet this requirement, we divide the ssao sampling points into two groups. One group samples within a small range, and the other uses a wide range of sampling, using a relatively flat depth-occlusion function, two groups of samples calculate the occlusion factor, and finally we use the one with relatively large occlusion.
 
 
 
Depth of Field Effect
In latency coloring, each frame requires a depth buffer and a normal buffer. Since this buffer is generated, we must use it effectively, for example, in our plot mode, we use very frequent depth-of-field effects. Next, we will discuss the problems encountered in implementing the depth of field effect and how to solve them.
 
 
Circle of confusion)
Any point of reflected light on an object is mirrored through a lens. If the plane is perfectly placed on an image plane, theoretically, the image of this point should also be a point, but if not, it will be split into a circle, and the image of each vertex on an object will become a circle, of course it will look blurred, the so-called depth of field is the maximum distance that allows the accept plane to offset before and after the image plane (for more precise explanations, see the article related to photography ). We want to allow the artist to better control the DOF effect, rather than physically. Therefore, we have defined the following parameters for the artist to adjust. The distance is focaldepth, and the depth of field range is noblurrange. The maximum fuzzy range is: maxblurrange, now we define a fuzzy factor, a0 = dofamount * max (0, ABS (depth-focaldepth)-noblurrange), a1 = maxblurrange-noblurrange, F = A0/A1, f = 0 not fuzzy F = 1 completely fuzzy. We can see that the fuzzy factor varies according to the depth linear. In general, we increase the sampling range of Gaussian fuzzy kernel based on the value of F to get the desired effect, but it is not ideal. The perfect solution is to increase the number of sampling points, however, this efficiency and hardware support are indeed not allowed.
 
 
 
In fact, the final solution mentioned in the Shawn Hargreaves PPT should be depth PEELING To achieve sequence-independent Alpha mixing. However, it is said that the efficiency is "amazing", so it is still worth it, however, it is not ruled out that the GPU will be used out faster in N years. Woo shadow ing is similar in principle. Please search for a thesis on Google for relevant knowledge.
 
 
Some questions:
Although the post-processing engine is a development trend, there are still several troublesome problems that need to be solved.
1. Perfect Alpha Solution
2. Support for special lighting materials, such as a person's hair, clothing, and skin must be rendered using different lighting methods. The hair is different from each other, and the clothing is diffuse reflection, the skin contains SSS. It is very important to know which coloring method is used for each pixel. Opening SC2 on the screen does not seem to solve this problem well, for example, the Zerg family should look wet and have mucus, and the human family should be mostly on the metal surface of rough coatings. The god family objects should look like high reflective materials, now, all objects look like plasticine, especially in strategic mode.
3. There are also some miscellaneous, such as reflection and shadow.
 
 
 
(Due to machine problems, Images cannot be pasted, and then supplemented)
 
 
 
Through technical analysis, we can draw a conclusion that if the results are fully available, Starcraft 2 must be a very expensive machine. If you want to buy a computer, you can wait and see.