In this article, I would like to remember my 18 hours of continuous work. Today, when I am working as an intern in Siemens CNC, I am still writing and painting on my draft paper. I think this technology is meaningful. Maybe I am still stupid and have insufficient Foundation. There is no way to solve this problem.
When we use GPU for Ray tracking, we need to find a method to accelerate the traversal scenario without exception. In volume n rendering, most of them use a fixed grid uniform grid structure for rendering, which is less efficient. We can only see the KD tree in the paper of Some GPU light trackers from Standford. For more information about how to use the KD tree segmentation model, see the part I wrote earlier.Article, Provided by many famous CG peopleCodeEach has its own kd tree implementation. The question is, how can we implement the GPU?
GPU has many advantages, mainly because of its super-fast SIMD/MIMD command processing unit and the specially optimized texture addressing function, which means the CPU is far inferior to the other two. However, the most critical point is that the current GPU cannot access the memory in the shader, that is, the pointer structure is not implemented yet, and dynamic memory allocation cannot be achieved. Of course, the allocated memory should be a video memory. If Cuda is used, it can be implemented. However, there are not many people having the g80 series graphics card, which makes no practical sense.
My idea is to add three texture maps to the shader. One is a plane-textured KD tree, and the other two are bind boxes, respectively storing the two coordinates of the point. In order to traverse the triangle, we also need to store three vertices in order to three textures. Because the KD tree also needs to interact with the Triangle Index texture, we must prepare the "chaotic index array" Chao index array, hereinafter referred to as the CIA (not the CIA ). The following describes in detail how these textures work in the shader.
We all understand the structure of the KD tree. First, the KD tree has two types of nodes: node and leaf. The Node Type records the dimension and position of the current split, without storing any information about the ry. At the bottom of the tree, it is almost the leaf type. The leaf type records the tree of the triangle and the starting offset position in the CIA. Why is it the CIA instead of the standard index starting from 0? Because I used the KD tree construction method and used stable_partition in STLAlgorithmProcess a triangle array.
In the shader, we first calculate the start position and direction of the light, which is very simple. Then, we enter the location of the bound box in the world (or we only draw the bound box) and start traversing the KD tree texture in the fragshader. First, get the first integer from the bottom layer. If it is in the range of 123, it is considered as a node (in fact, it must be). If it is a node, query the splitpos following it, in splitposarray, locate the specific split position and judge the position to the left to the right. If you need to query the integer in left, automatically add the offset to the T component of the texture coordinate to start a new query until the leaf is found, and read the offset and the number of triangles, start to test interpolation.
All textures use the rectangle_nv texture and the nearest method.
The pseudo glsl code is as follows:
Ivec2 traverselptr (0, 0 );
Bool found = false;
Int offset = 0, Count = 0;
While (found = false ){
Int dim = texture2d (kdaccmap, traverselptr. St). X;
If (DIM = 4 ){
Found = true;
Offset = texture2d (kdaccmap, traversalptr + ivec2 (0, 4). X;
Count = texture2d (kdaccmap, traversalptr + ivec2 (0, 5). X;
Countinue:
}
Testray ();
If (need goto left ){
Traverselptr. S + = texture2d (kdaccmap, traverselptr + ivec2 (0, 1 ));
} Else if (need goto right ){
Traverselptr. S + = texture2d (kdaccmap, traverselptr + ivec2 (0, 2 ));
}
}
For (INT I = 0; I <count; I ++ ){
// Use CIA and standard triangles index to coordinate vertices
}
Looks pretty good, right? Here is a serious problem: how should we traverse left and right at the same time? In terms of CPU and openrt design, a priority stack is used here. But is this in the shader? This may cause a small probability of errors, but we still need to consider it.