[Learning notes] string editing distance (string similarity)

Source: Internet
Author: User

String editing distance is a classic problem. It is often encountered during programming competitions and written test interviews, today, I encountered this problem in section 3.3 of "the beauty of programming" and reviewed it again. By the way, I would like to summarize it here.


[Problem description]

Given a source string and a target string, you can perform the following operations on the source string:
1. Insert a character at a given position
2. Replace any character
3. delete any character

Evaluate the minimum number of operation steps for the source string to be consistent with the target string through the above operations. (For a detailed description of the problem, see the beauty of mathematics 3.3 ).


[Solution Thinking]

Briefly describe the idea of solving this question. The source string and target string are str_a and str_ B respectively, and their lengths are La and lb, respectively. Define f [I, j] is the substring str_a [0... i] and str_ B [0... j] is the minimum editing distance. A simple analysis shows the str_a [0... i] and str_ B [0... j] There are three possible minimum editing distances:
(1) Remove str_a [0... the last character of I] is followed by str_ B [0... j] matching, then the value of F [I, j] is equal to f [I-1, J] + 1;
(2) Remove str_ B [0... the last character of J] is followed by str_a [0... f [I, j] is equal to f [I, J-1] + 1;
(3) Remove str_a [0... i] and str_ B [0... the last character of J], let the two match to obtain f [I-1, J-1], when calculating f [I, j], consider whether the current character is equal, if str_a [I] = str_ B [J] indicates that this character does not need to be edited, so the value of F [I, j] is equal to f [I-1,
J-1], if str_a [I]! = Str_ B [J] indicates that the character needs to be edited once (any modification to str_a [I] Or str_ B [J]), so the value of F [I, j] is equal to f [I-1, j-1] + 1.
Because the question requires the minimum editing distance, you can go to the minimum value in the above conditions. Therefore, you can get a recursive formula:

F [I, j] = min (F [I-1, J] + 1, F [I, J-1] + 1, F [I-1, j-1] + (str_a [I] = str_ B [J]? 0: 1 ))

It is a recursive formula in Wikipedia [Reference 1], which is actually the same as the above:



[Solution]

With the above recursive formula, you can program and implement it. there are usually two solutions.

First: Recursive Method (also a dynamic planning method). Write a recursive program using the above recursive formula. The source code will not be posted here. It is everywhere on the Internet. Attention should be paid to a problem with recursive method, from F [La-1, lb-1] Down recursion there will be repeated calculation f [x, y] (x <la-1, Y <lb-1) in this case, the program efficiency is very affected, so the F value is usually set up to cache, so there are two versions of the recursive method program, one is the simplest and most primitive version-without the fvalue cache, And the other version with the fvalue cache. The two versions of the program are provided in Wikipedia [Reference 1.

The second method is the recursive method (also known as the matrix marking method). Through analysis, we can calculate f [I, j] in a two-dimensional matrix, the above recursive formula can be seen as the calculation recursive formula of the matrix unit. As long as the matrix is filled up, the value of F [La-1, lb-1] requires the minimum editing distance, the specific process will not be described here. For more information, see [Reference 2]. There are also many apps on the Internet.

[Document 3] provides the three versions of programs mentioned above.


[Supplement]

Finally, let's add the definition of "string similarity", as mentioned in section 3.3 of "Beauty of programming": similarity expressed as 1/(minimum editing distance + 1 ), for example, the minimum editing distance between abcdefg and abcdef is 1 (delete g through abcdefg), and the similarity is 1/2 = 0.5.


[References]

Http://en.wikipedia.org/wiki/Levenshtein_distance

Http://www.cnitblog.com/ictfly/archive/2005/12/27/5828.aspx

Http://blog.163.com/kevinlee_2010/blog/static/16982082020111123111835146/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.