KD Tree Construction and search

Source: Internet
Author: User

Building Algorithms
The k-d tree is a binary tree, and each node represents a spatial range. Table 1 shows the data structures that are primarily contained in each node of the k-d tree. Table 1 data types for each node in the k-d tree
Domain name Data type Describe
Node-data Data vectors A data point in a dataset is an n-dimensional vector (this is K-dimension)
Range Space vector The space range represented by the node
Split Integer Number of directional axes perpendicular to the split superelevation plane
Left K-d Tree A k-d tree consisting of all the data points in the left dial hand space of the node that is split over the plane.
Right K-d Tree A k-d tree consisting of all the data points in the right subspace of the plane that is split over the node.
Parent K-d Tree Parent node
From the above description of the data type of the k-d tree node, it can be seen that constructing the k-d tree is a stepwise recursive process. Table 2 shows the pseudo-code that constructs the k-d tree. Table 2 pseudo code for building k-d tree
algorithm: Build k-d tree (createkdtree)
Input: Data point set Data-set and its space range
output : Kd, type k-d tree
1.If data-set empty, return empty k-d tree
2. Call the node generator: (1) Determine the Split field: for all strokes The data variance of each dimension is counted by the sub-data (eigenvector). Taking the surf feature as an example, the descriptors are 64 dimensions, and 64 variances can be calculated. Select the maximum value, and the corresponding dimension is the value of the Split field. Large data variance indicates that the data in the direction of the axis is dispersed, and the data segmentation in this direction has better resolution; (2) Determine the Node-data field: The data point set Data-set sorted by the value of its split field. The data point in the middle is selected as Node-data. At this point the new data-set ' = Data-set\node-data (except for this one node-data).
3.dataleft = {D belongs to Data-set ' && d[split]≤node-data[split]}left_range = {Range && Dataleft} dataright = {D belongs to Data-set ' && d[split] > node-data[split]}right_range = {Range && dataright}< /td>
4.left = k-d Tree established by (Dataleft,left_range), that is, recursive invocation of Createkdtree (Dataleft,left_range). and sets the left parent domain to Kd;right = k-d Tree established by (Dataright,right_range), which calls Createkdtree (Dataright,right_range). and set the parent domain of right to KD.
As shown in the example above, the process is as follows: As this example is simple, the data dimension is only 2 dimensions, so you can simply give X, y two directional axes numbered 0, 1, or split={0,1}. (1) Determine the first value to be taken for the split field. The variance of the data in the X, y direction is calculated to be the largest in the direction of X. So the Split field value is first taken 0, which is the x axis, and (2) determines the domain value of the node-data. The median value is 7 according to the value of the x-axis direction 2,5,9,4,8,7, so Node-data = (7,2). In this way, the split hyper plane of the node is through (7,2) and perpendicular to the split = 0 (x axis) of the line x = 7, (3) to determine the left dial hand space and the right subspace. Split Super Plane x = 7 divides the entire space into two parts, 2 shows. The portion of x < = 7 is left dial hand space, contains 3 nodes {(2,3), (5,4), (4,7)}, and the other part is the right subspace, which contains 2 nodes {(9,6), (8,1)}. As the algorithm describes, the construction of the k-d tree is a recursive process. The process of repeating the root node for the data in the left dial hand and right subspace can then be followed by the next level of child nodes (5,4) and (9,6) (that is, the ' root ' node of the left and right subspace), while further subdividing the space and dataset. This repeats until the space contains only one data point, as shown in 1. The last generated k-d is shown in Tree 3.4 Find algorithm
The search for data in the k-d tree is also an important part of feature matching, and its purpose is to retrieve the data points closest to the query point in the k-d tree. Here we first describe the basic idea of nearest neighbor lookup with a simple example. The asterisk indicates the point to query (2.1,3.1). With a binary search, the nearest approximation is quickly found along the search path, which is the leaf node (2,3). and the found leaf node is not necessarily the nearest neighbor, the nearest positive distance query point closer, should be located in the center of the query point and through the leaf node in the circle domain. To find a real nearest neighbor, you also need to ' backtrack ': The algorithm looks backwards along the search path for data points that are closer to the query point. In this example, the binary lookup starts at (7,2) point, then arrives (5,4), finally arrives (2,3), at this time the node in the search path is < (7,2), (5,4), (2,3), first the (2,3) as the current nearest neighbor, calculates its to the query point (2.1, 3.1) is a distance of 0.1414, then goes back to its parent (5,4) and determines whether there are data points closer to the query point in other child node spaces of the parent node. Take (2.1,3.1) as the center, draw a circle with a radius of 0.1414, as shown in 4. It is found that the circle does not take place with the hyper-plane y = 4, so it does not go into the right subspace of the (5,4) node to search. Again back to (7,2), with (2.1,3.1) as the center, the circle with a radius of 0.1414 will not be with the X = 7 ultra-plane delivery, so do not enter (7,2) right subspace to find. At this point, the nodes in the search path have all gone back, ending the entire search, returning the nearest neighbor (2,3), and the closest distance is 0.1414. A complex point is an example such as a lookup point for (2,4.5). The same first binary search, first from (7,2) found (5,4) node, in the search is made by y = 4 is divided over the plane, because the lookup point is the Y value of 4.5, so into the right subspace to find (4,7), the formation of a search path < (7,2), (5,4), (4,7); Fetch (4,7) is the current nearest neighbor, which calculates its distance from the target lookup point to 3.202. It then goes back to (5,4) and calculates the distance between it and the lookup point is 3.041. Take (2,4.5) as the center, with a radius of 3.041 for the circle, 5 is shown. Visible and y = 4 over-plane delivery, so need to enter (5,4) left dial hand space to find. The (2,3) node needs to be added to the search path < (7,2), (2,3) >. Back to (2,3) leaf node, (2,3) distance (2,4.5) is closer than (5,4), so the nearest neighbor point is updated to (2,3), the most recent distance is updated to 1.5. Back to (7,2), to (2,4.5) as a radius of Circle 1.5 for the circle, not and x = 7 split the super-planeDelivery, shown in 6. At this point, the search path is finished. Returns the nearest neighbor point (2,3), closest distance 1.5. The pseudo-code for the k-d tree query algorithm is shown below.
    1. Starting with the root node, DFS searches until the leaf node, while sequentially storing the nodes that have been accessed in the stack.
    2. If a leaf node is searched, the current leaf node is set as the nearest neighbor node.
    3. Then go back through the stack: if the current point is closer than the nearest neighbor, update the nearest neighbor. Then check if the circle with the closest radius is intersected with the parent node's hyper-plane. If you intersect, you must go to the other side of the parent node and start checking the nearest neighbor node with the same Dfs search method. If it does not intersect, it continues to backtrack, while the parent node is retired from the other side of the child node and is no longer considered in the scope.
    4. When the search returns to the root node, the search is complete and the nearest neighbor node is obtained.

KD Tree Construction and search

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.