Patterns Recognition (Pattern recognition) Learning notes (27)--Fast nearest neighbor method based on tree search algorithm

Source: Internet
Author: User

In order to improve the performance of these two aspects, it is proposed to use the branch-defining algorithm (Branch-bound algorithm) to improve the nearest neighbor method, which needs to traverse the computation distance in the nearest neighbor method. It is divided into two stages: 1) The sample set X is divided into hierarchical form by using the artificial partition or K-means clustering algorithm or other dynamic clustering algorithm to form a tree structure; 2) using the tree search algorithm to find the nearest neighbor with unknown sample.

1. Hierarchical Division

1) The sample set X is divided into a subset of L, and each subset is divided into a subset of L, which is continuously divided into a tree-like structure,


Once this is divided, there will be a subset of samples on each node;

2) The nodes are recorded as P, and the following parameters are computed for each node:

Subset owned on p node: Xp

Sample subset of XP on P node: Np

Sample mean for all samples in XP (can be seen as Sample centroid or center): Mp

The distance between the maximum sample and MP of the sample Center MP in XP is equivalent to the radius of the circle centered on the MP:


2. Determine if the nearest neighbor of an unknown sample x is on a subset of a node

For this judgment, there are two rules:

1) Rule one: If satisfied:, it is impossible to be the nearest neighbor of X;

where b is the nearest distance from the sample currently searched in XP to the sample X to be recognized, which is set at the initial time of the algorithm;

The explanation of this rule is that if a sample XI to the center MP in XP is farther away than the nearest closest to the center MP (that is, the radius) is much larger than the nearest neighbor distance found in the previous search, the sample Xi must be a circle away from the sample in XP, so it is not possible to be the nearest neighbor of X, as shown:


2) Rule two: If satisfied:, it is impossible to be the nearest neighbor of X;

This rule is proposed for the last layer of tree node, if the search algorithm to the last layer of nodes, then in order to avoid the last layer of all the nodes are calculated once the distance, you can use this rule to remove;

The interpretation of this rule is similar to the rule one: if the distance between the sample X and the center MP is greater than the distance from a sample XI to the center MP in XP and the current B's, it indicates that X is far from the node where Xi is, and that Xi is not the nearest neighbor of X, which can also be understood.

3. Tree Search algorithm

1) starting from the first level, namely L=0,p=0, the current initial nearest neighbor distance b=0;

2) Calculate the center MP and the radius RP respectively for the current node;

3) All direct successor nodes of the current node p are saved, and the distance D (X,MP) to the center Mp is calculated respectively.

4) According to rule one, all direct successor nodes of the current node are judged, and all nodes satisfying the rule one are eliminated directly;

5) If the saved node is currently empty, then fall back to the previous layer, that is l=l-1, if l=0, stop, otherwise jump to 4); If more than one node in the saved node exists, jump to 6);

6) Locate the nearest node P1 in the saved node by calculating D (X,MP), the least recent node P1, the P1 node as the current node, and emptying the saved node directory, if the current node P1 is at the last level, jump to 7), otherwise l=l+1, jump to 2);

7) to the current node P1 each sample XI use rules and second, the nearest neighbor judgment; As long as the sample XI of rule two is satisfied, it can be calculated without calculation D (X,XI), and compared with the nearest B currently obtained, if it is smaller than the current one, Then we think that we have found a sample more recent than the current one, then update B:b=d (X,XI), and save this recent sample's subscript position: nn=i; After all the samples have been judged, jump to 4);

8) Output The nearest neighbor of unknown sample x, and the distance:;

4. Improvements to K-nearest neighbors

The above is the nearest neighbor to do a computational improvement, for K nearest neighbor, basically similar to the algorithm described above, only need to make minor changes:

1) First, according to the nearest neighbor distance obtained, according to the actual situation to give the appropriate k adjacent distance table;

2) First the initial value of B: the distance from the farthest nearest neighbor of X to k nearest neighbor;

3) in the above step 7), after each calculation of a distance, and the distance from the distance table of the K songs to compare, if this distance is smaller than any one of the table, then decisively eliminate the table in the distance of the largest one, that is, to abandon the first k;





Patterns Recognition (Pattern recognition) Learning notes (27)--Fast nearest neighbor method based on tree search algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.