Google Interview question:k-nearest neighbor (k-d tree)

Source: Internet
Author: User

Question:

You is given information about hotels in a country/city. X and Y coordinates of each hotel is known. You need to suggest the list of nearest hotels to a user who's querying from a particular point (X and Y coordinates of T He user is given). Distance is calculated as the straight line Distance between the user and the hotel coordinates.

Assuming that the data size is n, you need to find K nearest hotel, the most direct way is to calculate the distance from the query coordinates of each hotel, with a small heap to record the nearest K hotels, the time complexity of O (Nlog (k)), the space complexity of O (k).

We can optimize the query efficiency by preprocessing the data. The coordinates of all hotels are sorted by x-coordinate first. For the query coordinates (x, y), given a (a guessed value), by the binary lookup interval [x-a,x+a], you can get all the x-coordinates of the hotel within the interval, and then record the nearest K hotels by a small heap of the previous method. We are not very concerned about the efficiency of data preprocessing, the time complexity is O (Nlog (n)), for the query, the time complexity of the binary lookup is O (log (n)), assuming that the results are filtered by a binary lookup of M, the second step of the time complexity of O (Mlog (k)). Therefore, the time complexity for the query is O (log (N)) +o (Mlog (k)). The problem of this approach is how to determine the value of a, inappropriate selection will cause (1) m too large, so that the query efficiency is reduced, (2) The result is inaccurate, because the possible K nearest hotel is outside the X range. If the result is not required to approximate the procedure completely and accurately, its efficiency is much higher than the previous method.

The second method sorts the data on the x-coordinate, but the y-coordinate is still unordered. Can we further optimize the structure of the data to improve the efficiency of the query? K-d tree is a data structure that can be considered to solve this kind of problem. For the concepts and principles of k-d tree, the following article describes it in great detail.

Http://web.stanford.edu/class/cs106l/handouts/assignment-3-kdtree.pdf

First of all, our problem is the point on the evaporating surface, so in this problem only need to implement k-d tree's two-dimensional situation. The nodes of the point and k-d tree are defined first.

 class   point2d{ int   X;     int   Y;  public  point2d (int  x, int   y) { this . x = X;      Y;    }}  class   kdtreenode{point2d Val;    Kdtreenode left;    Kdtreenode right;  public   Kdtreenode (point2d p) { this . val = P; }}

For k-d tree, there are many ways to define it. Some implementations have all the data stored in the leaf node, and the internal nodes only store information that divides the space. Here, just treat it as if it were the normal binary search tree.

Next is the construction of the k-d tree. After a certain understanding of the k-d tree, we know that the X and Y axes are divided alternately for each layer of the tree. In this topic, the construction of k-d tree belongs to data preprocessing, static data, do not need to consider the k-d tree Insert delete operation. We choose to divide by x coordinates, select coordinates as a node, divide the data into two parts, the x-coordinate of all the data on the left is not equal to the x-coordinate of the node, and the x-coordinate of all the data on the right is not less than the x-coordinate of the node. Then recursively, the root node is defined as the No. 0 layer, divided by the x-coordinate when the number of layers is even, and the y-coordinate when the number of layers is odd. So how do you choose the node coordinates for partitioning? In order for the k-d tree query to be efficient, the built k-d tree needs to be balanced, so the selected node is the median of x/y coordinates. With selection algorithm, you can find the node in O (N) time and divide the data evenly.

     Public StaticKdtreenode Constructkdtree (point2d[] array,intDepthintLowintHigh ) {        if(Low > High)return NULL; if(Low = = high)return NewKdtreenode (Array[low]); intMid = low+ (high-low)/2; POINT2D P= Quickselect (Array, Mid, low, high, depth%2); Kdtreenode node=NewKdtreenode (P); Node.left= Constructkdtree (array, depth+1, Low, mid-1); Node.right= Constructkdtree (Array, depth+1, mid+1, high); returnnode; }     Public StaticPOINT2D Quickselect (point2d[] array,intKintLowintHighintdimension) {         while(low<=High ) {            intPivotindex = partition (array, low, high,NewRandom (). Nextint (high-low+1) +low,dimension); if(Pivotindex = = k)returnArray[k]; Else if(Pivotindex < k) Low = Pivotindex+1; ElseHigh = PivotIndex-1; }        return NULL; }     Public Static intPartition (point2d[] array,intLowintHighintPivotintdimension) {        intPivotval = dimension==0?array[pivot].x:array[pivot].y;        Swap (array, pivot, high); intindex =Low ;  for(inti=low;i){            intCurval = dimension==0?array[i].x:array[i].y; if(curval<pivotval)                {Swap (array, index, i); Index++;        }} swap (array, high, index); returnindex; }     Public Static voidSwap (point2d[] array,intIintj) {        if(i!=j) {POINT2D tmp=Array[i]; Array[i]=Array[j]; ARRAY[J]=tmp; }    }

The construction of the k-d tree is completed with the time Complexity O (Nlog (n)) and the Space complexity O (n). Next is the query operation for the k-d tree. Our problem is to get a list of recent hotels. The linked article describes how to query the nearest coordinates and the nearest k coordinates. For this problem, I only implement returning the nearest coordinates.

First, the coordinates are searched by querying coordinates, and the nearest point in the Traverse path is recorded. Then, according to this distance R, the center point is the query coordinate, the radius is R's search space, again to the Kd-tree query whether there is a closer point.

    StaticPOINT2D Nearestpoint =Newpoint2d (Integer.max_value, Integer.max_value); Static intMin =Integer.max_value;  Public Static voidQueryhelper (kdtreenode root, point2d query,intdepth) {        if(Root = =NULL)return; intDistance = (query.x-root.val.x) * (query.x-root.val.x) + (QUERY.Y-ROOT.VAL.Y) * (query.y-root.val.y); if(Distance <min) {min=distance; Nearestpoint=Root.val; }        intCurval = depth%2==0?query.x:query.y; intNodeval = depth%2==0?query.x:query.y; if(Curval > Nodeval) queryhelper (root.right, query, depth+1); Else if(Curval < Nodeval) Queryhelper (root.left, query, depth+1); Else{queryhelper (root.right, query, depth+1); Queryhelper (root.left, query, depth+1); }    }     Public Static voidQuerynearesthelper (kdtreenode root, point2d query,intDepthDoubleXMin,DoubleXmax,DoubleYmin,DoubleYMax) {        if(Root = =NULL)return; intDistance = (query.x-root.val.x) * (query.x-root.val.x) + (QUERY.Y-ROOT.VAL.Y) * (query.y-root.val.y); if(Distance <min) {min=distance; Nearestpoint=Root.val; }        intCurval = depth%2==0?query.x:query.y; intNodeval = depth%2==0?query.x:query.y; DoubleRangeMin = depth%2==0?xmin:ymin; DoubleRangeMax = depth%2==0?Xmax:ymax; if(Curval >nodeval) {querynearesthelper (root.right, query, depth+1, XMin, Xmax, ymin, YMax); if(Nodeval > RangeMin) querynearesthelper (root.left, query, depth+1, XMin, Xmax, ymin, YMax); }        Else if(Curval <nodeval) {querynearesthelper (root.left, query, depth+1, XMin, Xmax, ymin, YMax); if(Nodeval < RangeMax) Querynearesthelper (root.right, query, depth+1, XMin, Xmax, ymin, YMax); }        Else{queryhelper (root.right, query, depth+1); Queryhelper (root.left, query, depth+1); }    }     Public Static voidquerynearest (kdtreenode root, point2d query) {queryhelper (root, query,0); DoubleXMin = query.x-math.sqrt (min), xmax = query.x+math.sqrt (min), ymin = query.y-math.sqrt (min), YMax = query.y+math.sqrt (min); Querynearesthelper (root, query,0, XMin, Xmax, ymin, YMax); }

At this point, we completed the latest hotel query, the time complexity of O (log (N)).

Google Interview question:k-nearest neighbor (k-d tree)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.