PBRT reading: Fourth chapter of the volume and the intersection of Accelerated Section 4.4 __construction
Source: Internet
Author: User
4.4 kd-Tree Accelerator
The two-dimensional partition (Binary space partitioning, BSP) tree adaptively divides the space into regions of varying sizes. Compared with homogeneous grids, BSP is a more efficient data structure for scenes with uneven geometry distribution. The BSP tree is created starting with a bounding box that contains the entire scene. If the number of BODY element in the box exceeds a certain threshold, it is necessary to use a plane to divide the bounding box in two halves. The BODY element is associated with the half space it overlaps, and if the body is overlapping with two spaces, it is associated with two spaces. This segmentation process is performed recursively until each leaf region contains enough body element, or the recursive depth has reached the given maximum value. Because the segmentation plane can be divided in any position, and can do any degree of segmentation of any part of the three-dimensional space, so the BSP tree can easily deal with the uneven distribution of geometry.
There are two common BSP trees: the kd-tree and the eight-fork tree. The kd-tree requires that the split plane be perpendicular to an axis, which makes the tree's creation and traversal efficient, but at the expense of some flexibility about fragmentation. The octree divides the space into eight regions (usually at the center of the space) with three vertical planes. In this section, we introduce the Kdtreeaccel class, which implements a kd-tree that accelerates the intersection of light.
<kdtreeaccel declarations> =
Class Kdtreeaccel:public Aggregate {
Public
<kdtreeaccel Public methods>
Private
<kdtreeaccel Private data>
};
The parameters of the Kdtreeaccel constructor include the BODY element to be stored, and some parameters that control the creation of the tree. These parameters are stored in the member variable for later use. For simplicity's sake, kdtreeaccel requires that all of the bodies be made available. Therefore, before the tree is created, the constructor converts all the non-intersection bodies into a can of intersection.
<kdtreeaccel Method definitions> =
Kdtreeaccel:: Kdtreeaccel (const vector<reference<primitive>> &P,
int icost, int tcost,
float ebonus, int maxp, int maxDepth)
: Isectcost (Icost), Traversalcost (Tcost), Maxprims (MAXP), Emptybonus (Ebonus) {
vector< reference<primitive>> prims;
for (U_int i = 0; i < p.size (); ++i)
P->fullyrefine (prims);
<initialize Mailboxes for kdtreeaccel>
<build Kd-tree for Accelerator>
}
<kdtreeaccel Private data> =
int Isectcost, traversalcost, maxprims;
float Emptybonus;
Like Gridaccel, the kd-tree also uses the mailbox technology to avoid repetitive intersection calculations. In fact, it uses the same mailboxprim structure.
<initialize Mailboxes for kdtreeaccel> =
Curmailboxid = 0;
Nmailboxes = Prims.size ();
Mailboxprims = (Mailboxprim *) allocaligned (nmailboxes * sizeof (Mailboxprim));
for (U_int i = 0; i < nmailboxes; ++i)
New (&mailboxprims) Mailboxprim (prims);
The kd-tree is a binary tree with two nodes in each of its internal nodes, and each leaf node holds the overlapping BODY element. Each internal node must contain the following three types of information:
· Split axis: which axis (x,y or Z axis) the node is divided by.
· Split position: The position of the split plane on the split axis.
· Sub-node: The information used to find two child nodes.
Each leaf node only holds the body element that overlaps with the region.
We're going to do some slightly more complicated work so that all the internal nodes and part of the leaf nodes are in 8 bytes (assuming that the floating-point number and the pointer are 4 bytes), so that the 4 nodes will be just over 32 bytes of cache line. Because the tree has a large number of nodes, each light to use a large number of nodes, reducing the memory usage of nodes can greatly improve the cache efficiency. We initially used 16 bytes to store the node, then 8 bytes, 20% of the speed elevation. The leaf nodes and the internal nodes are all in the following kdaccelnode structure. Note that each union member's comments indicate whether the member is used in an internal node, a leaf node, or both.
The minimum two bits of the kdaccelnode::flags are used to distinguish between x, y, or Z Internal nodes (respectively, with value 0,1,2) and leaf nodes (with a value of 3).
In contrast, it is easier to put a leaf node into 8 bytes of memory: Because Kdaccelnode::flags's low two bits are used to indicate that it is a leaf node, Kdaccelnode::nprims's high 30 bits are used to indicate how many bodies overlap with it. As with Gridaccel, if only one of the bodies overlaps with it, then its mailboxprim pointer is stored directly in kdaccelnode::oneprimitive. If there are multiple bodies overlapping, then dynamically request an array of these pointers, and Kdaccelnode::p rimitives points to the array.
It is easy to initialize the leaf node: before storing the number of bodies in the node, move it to the left two-bit, and set the low two-bit of kdaccelnode:flags to 3 to show that it is a leaf node.
Because we have kdaccelnode::oneprimitive, there is no need to apply for memory for those leaves with 0 or 1 bodies. If multiple bodies overlap, the caller passes in a Memoryarena (see sect. a.2.4), which is used to request an array of pointers for Mailboxprim. The Memoryarena class can help reduce the waste of space and put these arrays together to improve cache efficiency.
<store Mailboxprim *s for leaf node> =
if (NP = 0)
Oneprimitive = NULL;
else if (NP = 1)
oneprimitive = &mailboxprims[primnums[0]);
else {
Primitives = (Mailboxprim * *) arena. Alloc (NP * sizeof (MAILBOXPRIM *));
for (int i = 0; i < NP; i++)
Primitive = &mailboxPrims[primNums];
}
It takes more work to represent an internal node in 8 bytes. As explained earlier, the Kdaccelnode::flags 2 bits record the split axis. However, the split position Kdaccelnode::split as a floating-point number and shares the same memory address with Kdaccelnode::flags. It doesn't seem possible-we can't tell the compiler to use only the Kdaccelnode::split 30 bits as a floating-point number.
However, as long as we set the kdaccelnode::split after setting the Kdaccelnode::flag low 2 bit, we can achieve the above effect. This technique gains the memory layout of IEEE floating-point numbers: the low 2 bits used by the Kdaccelnode::flag only account for the two least significant digits of the floating-point number, which have little effect on floating point values.
Although this technique is quite out of the way, it is worthwhile to improve the performance of a 8-byte storage tree node. In addition, we use several kdaccelnode functions to hide all of these complexities, and other parts of the implementation are not affected by it.
We do not need extra memory to hold the pointer to two sub nodes in the internal node. All nodes are stored in a contiguous memory area, and the child node that represents the area below the partition surface is placed close to its parent node (this also increases cache efficiency because at least one child node is adjacent to its parent node). Another representation of the child node above the partition is placed in another position in the array, and the kdaccelnode::abovechild points to that position.
With the above agreement, we can initialize the internal node. Before we write the split axis to Kdaccelnode::flags, we want to set the split position (otherwise it will overwrite the split position).
The kd-tree is created with a top-down recursive algorithm. At the beginning of each step, there is an axis-aligned space area and a group of bodies that overlap with it. This area is either divided into two subregions and becomes an internal node, or creates a leaf node that ends the recursive process.
In discussing Kdaccelnode, we mentioned that all tree nodes are stored in a contiguous array of memory. The Kdtreeaccel::nextfreenode records the next available node in the array. Kdtreeaccel::nallocednodes records the number of nodes that have been allocated memory. We initialize them to 0, where the implementation ensures that memory is allocated as soon as the first tree node is initialized.
If the user does not provide the maximum depth for the constructor tree, we need to determine a default value. Although the recursive process of a tree can normally end naturally at some depth, it is still important to set the maximum depth value, which prevents a lot of memory from being consumed in some morbid cases. We found that for most scenarios, 8+1.3log (N) is a reasonable maximum depth.
<build Kd-tree for accelerator> =
Nextfreenode = nallocednodes = 0;
if (maxDepth <= 0)
MaxDepth = Round2int (8 + 1.3f * Log2int (float (prims.size ()));
<compute bounds for Kd-tree construction>
<allocate working memory for Kd-tree Construction>
<initialize primnums for Kd-tree construction>
<start Recursive construction of kd-tree>
<free working memory for Kd-tree Construction>
<kdtreeaccel Private data> + =
Kdaccelnode *nodes;
int nallocednodes, Nextfreenode;
Because the constructor routines are to reuse the bounding boxes of the bodies, we put them in a vector before constructing the tree, which eliminates repeated calls to Primitive::worldbound () that are likely to be slow.
<compute bounds for kd-tree construction> =
Vector<bbox> Primbounds;
Primbounds.reserve (Prims.size ());
for (U_int i = 0; i < prims.size (); ++i) {
Bbox B = Prim->worldbound ();
bounds = Union (bounds, b);
Primbounds.push_back (b);
}
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.