Google Interview question:k-nearest neighbor (k-d tree)

Last Update:2015-07-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Question:

You is given information about hotels in a country/city. X and Y coordinates of each hotel is known. You need to suggest the list of nearest hotels to a user who's querying from a particular point (X and Y coordinates of T He user is given). Distance is calculated as the straight line Distance between the user and the hotel coordinates.

Assuming that the data size is n, you need to find K nearest hotel, the most direct way is to calculate the distance from the query coordinates of each hotel, with a small heap to record the nearest K hotels, the time complexity of O (Nlog (k)), the space complexity of O (k).

We can optimize the query efficiency by preprocessing the data. The coordinates of all hotels are sorted by x-coordinate first. For the query coordinates (x, y), given a (a guessed value), by the binary lookup interval [x-a,x+a], you can get all the x-coordinates of the hotel within the interval, and then record the nearest K hotels by a small heap of the previous method. We are not very concerned about the efficiency of data preprocessing, the time complexity is O (Nlog (n)), for the query, the time complexity of the binary lookup is O (log (n)), assuming that the results are filtered by a binary lookup of M, the second step of the time complexity of O (Mlog (k)). Therefore, the time complexity for the query is O (log (N)) +o (Mlog (k)). The problem of this approach is how to determine the value of a, inappropriate selection will cause (1) m too large, so that the query efficiency is reduced, (2) The result is inaccurate, because the possible K nearest hotel is outside the X range. If the result is not required to approximate the procedure completely and accurately, its efficiency is much higher than the previous method.

The second method sorts the data on the x-coordinate, but the y-coordinate is still unordered. Can we further optimize the structure of the data to improve the efficiency of the query? K-d tree is a data structure that can be considered to solve this kind of problem. For the concepts and principles of k-d tree, the following article describes it in great detail.

Http://web.stanford.edu/class/cs106l/handouts/assignment-3-kdtree.pdf

First of all, our problem is the point on the evaporating surface, so in this problem only need to implement k-d tree's two-dimensional situation. The nodes of the point and k-d tree are defined first.

 class   point2d{ int   X;     int   Y;  public  point2d (int  x, int   y) { this . x = X;      Y;    }}  class   kdtreenode{point2d Val;    Kdtreenode left;    Kdtreenode right;  public   Kdtreenode (point2d p) { this . val = P; }}

For k-d tree, there are many ways to define it. Some implementations have all the data stored in the leaf node, and the internal nodes only store information that divides the space. Here, just treat it as if it were the normal binary search tree.

Next is the construction of the k-d tree. After a certain understanding of the k-d tree, we know that the X and Y axes are divided alternately for each layer of the tree. In this topic, the construction of k-d tree belongs to data preprocessing, static data, do not need to consider the k-d tree Insert delete operation. We choose to divide by x coordinates, select coordinates as a node, divide the data into two parts, the x-coordinate of all the data on the left is not equal to the x-coordinate of the node, and the x-coordinate of all the data on the right is not less than the x-coordinate of the node. Then recursively, the root node is defined as the No. 0 layer, divided by the x-coordinate when the number of layers is even, and the y-coordinate when the number of layers is odd. So how do you choose the node coordinates for partitioning? In order for the k-d tree query to be efficient, the built k-d tree needs to be balanced, so the selected node is the median of x/y coordinates. With selection algorithm, you can find the node in O (N) time and divide the data evenly.

     Public StaticKdtreenode Constructkdtree (point2d[] array,intDepthintLowintHigh ) {        if(Low > High)return NULL; if(Low = = high)return NewKdtreenode (Array[low]); intMid = low+ (high-low)/2; POINT2D P= Quickselect (Array, Mid, low, high, depth%2); Kdtreenode node=NewKdtreenode (P); Node.left= Constructkdtree (array, depth+1, Low, mid-1); Node.right= Constructkdtree (Array, depth+1, mid+1, high); returnnode; }     Public StaticPOINT2D Quickselect (point2d[] array,intKintLowintHighintdimension) {         while(low<=High ) {            intPivotindex = partition (array, low, high,NewRandom (). Nextint (high-low+1) +low,dimension); if(Pivotindex = = k)returnArray[k]; Else if(Pivotindex < k) Low = Pivotindex+1; ElseHigh = PivotIndex-1; }        return NULL; }     Public Static intPartition (point2d[] array,intLowintHighintPivotintdimension) {        intPivotval = dimension==0?array[pivot].x:array[pivot].y;        Swap (array, pivot, high); intindex =Low ;  for(inti=low;i){            intCurval = dimension==0?array[i].x:array[i].y; if(curval<pivotval)                {Swap (array, index, i); Index++;        }} swap (array, high, index); returnindex; }     Public Static voidSwap (point2d[] array,intIintj) {        if(i!=j) {POINT2D tmp=Array[i]; Array[i]=Array[j]; ARRAY[J]=tmp; }    }

The construction of the k-d tree is completed with the time Complexity O (Nlog (n)) and the Space complexity O (n). Next is the query operation for the k-d tree. Our problem is to get a list of recent hotels. The linked article describes how to query the nearest coordinates and the nearest k coordinates. For this problem, I only implement returning the nearest coordinates.

First, the coordinates are searched by querying coordinates, and the nearest point in the Traverse path is recorded. Then, according to this distance R, the center point is the query coordinate, the radius is R's search space, again to the Kd-tree query whether there is a closer point.

    StaticPOINT2D Nearestpoint =Newpoint2d (Integer.max_value, Integer.max_value); Static intMin =Integer.max_value;  Public Static voidQueryhelper (kdtreenode root, point2d query,intdepth) {        if(Root = =NULL)return; intDistance = (query.x-root.val.x) * (query.x-root.val.x) + (QUERY.Y-ROOT.VAL.Y) * (query.y-root.val.y); if(Distance <min) {min=distance; Nearestpoint=Root.val; }        intCurval = depth%2==0?query.x:query.y; intNodeval = depth%2==0?query.x:query.y; if(Curval > Nodeval) queryhelper (root.right, query, depth+1); Else if(Curval < Nodeval) Queryhelper (root.left, query, depth+1); Else{queryhelper (root.right, query, depth+1); Queryhelper (root.left, query, depth+1); }    }     Public Static voidQuerynearesthelper (kdtreenode root, point2d query,intDepthDoubleXMin,DoubleXmax,DoubleYmin,DoubleYMax) {        if(Root = =NULL)return; intDistance = (query.x-root.val.x) * (query.x-root.val.x) + (QUERY.Y-ROOT.VAL.Y) * (query.y-root.val.y); if(Distance <min) {min=distance; Nearestpoint=Root.val; }        intCurval = depth%2==0?query.x:query.y; intNodeval = depth%2==0?query.x:query.y; DoubleRangeMin = depth%2==0?xmin:ymin; DoubleRangeMax = depth%2==0?Xmax:ymax; if(Curval >nodeval) {querynearesthelper (root.right, query, depth+1, XMin, Xmax, ymin, YMax); if(Nodeval > RangeMin) querynearesthelper (root.left, query, depth+1, XMin, Xmax, ymin, YMax); }        Else if(Curval <nodeval) {querynearesthelper (root.left, query, depth+1, XMin, Xmax, ymin, YMax); if(Nodeval < RangeMax) Querynearesthelper (root.right, query, depth+1, XMin, Xmax, ymin, YMax); }        Else{queryhelper (root.right, query, depth+1); Queryhelper (root.left, query, depth+1); }    }     Public Static voidquerynearest (kdtreenode root, point2d query) {queryhelper (root, query,0); DoubleXMin = query.x-math.sqrt (min), xmax = query.x+math.sqrt (min), ymin = query.y-math.sqrt (min), YMax = query.y+math.sqrt (min); Querynearesthelper (root, query,0, XMin, Xmax, ymin, YMax); }

At this point, we completed the latest hotel query, the time complexity of O (log (N)).

Google Interview question:k-nearest neighbor (k-d tree)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Google Interview question:k-nearest neighbor (k-d tree)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Google Interview question:k-nearest neighbor (k-d tree)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support