K-d Tree Learning Summary

Source: Internet
Author: User

This article is reproduced from http://blog.csdn.net/zhjchengfeng5/article/details/7855241#

Let's start with a question:

A point set on a given plane E, there is also a point V, how to find a point in a group of U, so that the distance between V and U nearest (Euclidean distance)?

Of course, we can think of a way to enumerate all the points in E and find out the nearest point U in their distance from V.

However, suppose there are now two point sets E1 and E2, for each point in the E2 VI, to find a point in the E1 UI, so that the minimum distance from the Vi to the Ui, how to do this? Or an enumeration?

Now that the complexity of the enumeration is high (O (n) complexity), is there a way to lower the complexity? The answer is yes, introducing a data structure:k-d tree

One, what is k-d tree?

Binary tree (the kind of tree structure with left son and right son)

Second, what problems can be solved?

K-d tree can find a point set E in the time complexity of log (n) (Worst of sqrt (n)), the closest point from a fixed point V (nearest neighbor query), and a little bit of processing, we can also find the point set E in the distance distance V nearest K points (K nearby queries)

Third, how to use k-d tree to solve the above problem?

Point set E points in accordance with a certain rule of the construction of a binary tree, query the time on this built two-fork tree with log (n) (the worst is sqrt (n)) time complexity to query the nearest point

Four, since is two fork tree, how to build?

This is the most critical place, because whether it is partition tree, line tree, dictionary tree, or even other data structures or algorithms (such as KMP, etc.), the reason for efficient processing of problems, mainly is the good preprocessing. The reason why k-d tree is efficient is because it is very good, and it is in the rule that the point set E is built in a binary tree According to some rules .

Before we talk about this rule, let's take a look at k-d tree, why this data structure is called k-d tree.

K:k k in Proximity query

D: space is D-dimensional space (demension)

Tree: You can understand it as a two tree, or simply as a

Well, K we've already used it, and we've already used the tree, but what about D? It seems that this article has not mentioned the D bar so far?

This rule is the " dimension" of space.

If you want to make a contribution, then the nodes on the tree must define some states:

State of the node:

Split Point (Split_point)

Split Mode (Split_method)

Left son (Left_son)

Right son (Right_son)

The rule we build is in the state of the node: the way of splitting (Split_method)

Presumably the reader has seen the above keyword: splitting the point of splitting the way, why repeated the emergence of the word split? Is there a k-d tree to divide and divide the space?

Yes, theestablishment of k-d Tree is the process of splitting space !

How to build it?

Basis of achievement:

Calculate the current interval [L, R] (where the interval is the ordinal interval of a point, not our actual coordinate interval), the variance of each dimension of the coordinates of each point, the dimension with the largest variance, and set to D, as our splitting method (Split_method), the point in the interval according to the size of D , from small to large sorting, take the middle point sorted_mid as the current node record of the split point, and then to [L, Sorted_mid-1] for the left dial hand tree, to [Sorted_mid+1, R] for the right subtree, so that the current node of all States we will determine down The

Split_point= Sorted_mid

Split_method= D

Left_son = [L, sorted_mid-1]

Right_son = [Sorted_mid+1, R]

For the sake of understanding, let me give an example:

Suppose we now have a point set E on the plane, where there are 5 points on the two-dimensional plane: (1,4) (5,8) (4,2) (7,9) (10,11)

Their distribution on the plane.

First, we build on the interval [1, 5]:

First calculate the variance on the first dimension (that is, the x-coordinate) in the interval:

Average: Ave_1 =5.4

Variance: Varance_1 =9.04

The variance in the second dimension (that is, the y-coordinate) is calculated in the interval:

Average: ave_2 =6.8

Variance: varance_2 =10.96

Obviously see, varance_2 > Varance_1, then we in this build, split way: Split_method =2, and then all the points according to the size of the 2nd dimension from small to large sort, get a new point of the arrangement:

(4,2) (1,4)(5,8) (7,9) (10,11)

Take the middle point as the split point sorted_mid =(5, 8) as the root node, and then the interval [1, 2] built the left subtree, [4, 5] built the right sub-tree, at this time, the line: y = 8 split the plane into two halves, the front half to the left son , and the back half gave the right son,

When the left sub-tree [1, 3] can be found, this is the first dimension of the variance is large, splitting is 1, the interval [1, 2] points in accordance with the size of the first dimension, from small to large sort, take the middle point (1,4) root node, and then the interval [2, 2 ] Build right subtree to get node (4,2)

Build right sub-tree [4, 5] When you can find that this is still the first dimension of the variance is large, so we have to get such a binary tree is also k-d tree, it divides the plane into the following facets, so that each facet has a maximum of one point:

As you can see, we actually divide the entire plane into 4 parts in the process of making a contribution.

The tree is built, so what about the query?

Query process:

Query, in fact, we want to "add" a point to the already built k-d tree, but not really add, just find his should be in the subspace can, so the query appears simple poison attack

Each time in a range of query, first look at the division of this interval is what, that is, to see the interval is divided according to which Bellavita, so that if the point corresponds to the dimension above the root node is small, on the left subtree of the root node on the query operation, if it is large, in the right sub-tree upward query operation

Each time it goes back to the root node (that is, the search for one of his subtrees has been completed), judging by the point as the center, the minimum distance found is the radius, to see if it intersects with the plane of that dimension of the split interval, if it intersects, the nearest point may still be on another subtree, So we have to look at another subtree and see if we can update our nearest distance with the distance from the root node to that point. Why is this, we can use a picture to illustrate:

When we look at the left son, we find that the smallest distance is R = 10, and when we go back to the father's node, we find that with the target point (10,1) as the center, now the minimum distance r = 10 is the radius of the circle, and split plane y = 8 intersect, this time, if we do not at the Father node's right son If you look for it, you'll miss the Point (10,9), which is actually the closest point to the target point (10,1).

Since each query may be the left and right side of the subtree are queried, so, the query is not simple log (n), the worst time can be reached sqrt (n)


Well, to this, k-d tree is almost, writing and a lot of places to optimize, as to how to transform the nearest neighbor query to K proximity query, we use an array to record whether a point can be used to update the nearest distance can be, the following affixed k-d tree a template

#include <iostream> #include <cstdio> #include <cstring> #include <cmath> #include <algorit hm> #include <vector> #include <string> #include <queue> #include <stack> #define In T_inf 0x3fffffff #define Ll_inf 0x3fffffffffffffff #define EPS 1e-12 #define MOD 1000000007 #define PI 3.1415926535797        98 #define N 60000 using namespace std;  typedef long Long LL;  typedef unsigned long long ULL;        typedef double DB;      struct Data {LL pos[10];  int id;  } T[n], op, point;        int split[n],now,n,demension;  BOOL Use[n];  LL Ans,id;        DB VAR[10];  BOOL CMP (data A,data b) {return a.pos[split[now]]<b.pos[split[now]];            } void Build (int l,int R) {if (l>r) return;                int mid= (L+R) >>1;          Calculate the variance for (int pos=0;pos<demension;pos++) {DB ave=var[pos]=0.0) above each dimension; for (int i=l;i<=r;i++) ave+=t[i].pos[pOS];          Ave/= (r-l+1);          for (int i=l;i<=r;i++) var[pos]+= (t[i].pos[pos]-ave) * (T[i].pos[pos]-ave);      Var[pos]/= (r-l+1);      }//Find the one dimension with the greatest variance, use it as the split_method of the current interval, Split_method save in Split[mid] split[now=mid]=0;                    for (int i=1;i<demension;i++) if (var[split[mid]]<var[i]) split[mid]=i;                Sorting the interval rows, find the middle point nth_element (T+L,T+MID,T+R+1,CMP);      Build (L,mid-1);  Build (Mid+1,r);      }//bulid after Split{i] represents the split mode with the I node as the center point of void Query (int l,int R) {if (l>r) return;                int mid= (L+R) >>1;      Find out the target point op to the current root node distance LL dis=0;                for (int i=0;i<demension;i++) dis+= (Op.pos[i]-t[mid].pos[i]) * (Op.pos[i]-t[mid].pos[i]);  If the root node of the current interval can be used to update the nearest distance, and dis is less than the already obtained ans if (!use[t[mid].id] && Dis<ans) {Ans=dis;  Update nearest distance Point=t[mid];  The update obtains the point id=t[mid].id at the nearest distance; Update the ID of the point that gets the nearest distance}//CalculationOP to split plane distance LL radius= (Op.pos[split[mid]]-t[mid].pos[split[mid]]) * (Op.pos[split[mid]]-t[mid].pos[split[mid]);          Query the sub-interval if (Op.pos[split[mid]]<t[mid].pos[split[mid]]) {query (l,mid-1);      if (radius<=ans) query (MID+1,R);          } else {query (MID+1,R);      if (radius<=ans) query (l,mid-1); }} int main () {while (scanf ("%d%d", &n,&demension)!=eof) {//read into n points for (int i=1;i<=n;i++) {for (int j=0;j<demension;j++) scanf ("%i64d", &t[i].pos[j]                 );          T[i].id=i;  } build (1,n); achievement int m,q;  scanf ("%d", &q);                                Q Ask while (q--) {memset (use,0,sizeof (use));              for (int i=0;i<demension;i++) scanf ("%i64d", &op.pos[i]);              scanf ("%d", &m); printf ("The closest%d Points are:\n ", m);                  while (m--) {ans= ((LL) int_inf) *int_inf);                  Query (1,n);                      for (int i=0;i<demension;i++) {printf ("%i64d", Point.pos[i]);                      if (i==demension-1) printf ("\ n");                  else printf ("");              } use[id]=1;  }}} return 0;   }


K-d Tree Learning Summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.