Super Cool Algorithm-BK tree

Source: Internet
Author: User

A few days ago accidentally encountered a blog, think written very good, their own time before there is a bad habit, that is met with a good resource first reaction is to collect it and then rarely see!! This is a bad habit, to change! So I began to read it today, and the second one was the BK tree. Feel a little meaning, and so on the germination of writing a blog what, but, I found that someone has translated. Then why do you invent the wheel again, in view of the original author's declaration prohibit reprint, then forget it, want to see the original source here.

The following simple explanation of this algorithm, it is not difficult, but the idea is a bit ingenious.

BK tree to solve a problem, the simple thing is to find similar strings, such as "book" and "Boon" is not only a letter, very similar to it.

Let's start with a definition of similarity: Use editing distances to measure the similarity between two strings. The editing distance between string A and string B is at least a few operations (deleting a letter, inserting a letter, replacing a letter) to make a a B. The above mentioned "book" and "Boon" the editing distance is 1, because only need to update the letter ' K ' to ' n ' can achieve the purpose.

Next we look at the nature of the editing distance, and we use L (A, B) to represent the editing distance between string A and string B. So what is the distance from string B in order to find a string C with a distance of not more than M? The answer is L (A, B)-M <= L (b,c) <= L (A, B) +m. Why? M step A, C can be converted to each other, and L (b,c) step B, C can be converted to each other, so m+l (b,c) step within, a, B is bound to be converted, so there is L (a, a) <= L (b,c) +m; In the same sense L (b,c) <= L (A, B) +m.

Then the BK tree will be ready to play. The edge of the BK tree is numbered, and the number value is the direct editing distance of the two nodes of the edge.

We first select a string z as the root node in the string collection, and then each time we remove a string x from the collection and insert it into the tree. The insert rule is to first calculate the edit distance L (x,z) of X and the root node z, and then insert the node into the child with the number L (x,z) in Z; recursion until X can become a leaf node.

When we look for a similar string of string a (assuming that the editing distance is 2 or even similar), then from the root node to look for, first calculate L (z,a), this time we know that with a edit distance of 2 of the string can only exist in Z number is L (z,a)-2 to the number of L (z,a) + Within the 2 of those subtrees, you will find it recursively.

Super Cool Algorithm-BK tree

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.