String editing distance is a classic problem. It is often encountered during programming competitions and written test interviews, today, I encountered this problem in section 3.3 of "the beauty of programming" and reviewed it again. By the way, I would like to summarize it here.
[Problem description]
Given a source string and a target string, you can perform the following operations on the source string:
1. Insert a character at a given position
2. Replace any character
3. delete any character
Evaluate the minimum number of operation steps for the source string to be consistent with the target string through the above operations. (For a detailed description of the problem, see the beauty of mathematics 3.3 ).
[Solution Thinking]
Briefly describe the idea of solving this question. The source string and target string are str_a and str_ B respectively, and their lengths are La and lb, respectively. Define f [I, j] is the substring str_a [0... i] and str_ B [0... j] is the minimum editing distance. A simple analysis shows the str_a [0... i] and str_ B [0... j] There are three possible minimum editing distances:
(1) Remove str_a [0... the last character of I] is followed by str_ B [0... j] matching, then the value of F [I, j] is equal to f [I-1, J] + 1;
(2) Remove str_ B [0... the last character of J] is followed by str_a [0... f [I, j] is equal to f [I, J-1] + 1;
(3) Remove str_a [0... i] and str_ B [0... the last character of J], let the two match to obtain f [I-1, J-1], when calculating f [I, j], consider whether the current character is equal, if str_a [I] = str_ B [J] indicates that this character does not need to be edited, so the value of F [I, j] is equal to f [I-1,
J-1], if str_a [I]! = Str_ B [J] indicates that the character needs to be edited once (any modification to str_a [I] Or str_ B [J]), so the value of F [I, j] is equal to f [I-1, j-1] + 1.
Because the question requires the minimum editing distance, you can go to the minimum value in the above conditions. Therefore, you can get a recursive formula:
F [I, j] = min (F [I-1, J] + 1, F [I, J-1] + 1, F [I-1, j-1] + (str_a [I] = str_ B [J]? 0: 1 ))
It is a recursive formula in Wikipedia [Reference 1], which is actually the same as the above:
[Solution]
With the above recursive formula, you can program and implement it. there are usually two solutions.
First: Recursive Method (also a dynamic planning method). Write a recursive program using the above recursive formula. The source code will not be posted here. It is everywhere on the Internet. Attention should be paid to a problem with recursive method, from F [La-1, lb-1] Down recursion there will be repeated calculation f [x, y] (x <la-1, Y <lb-1) in this case, the program efficiency is very affected, so the F value is usually set up to cache, so there are two versions of the recursive method program, one is the simplest and most primitive version-without the fvalue cache, And the other version with the fvalue cache. The two versions of the program are provided in Wikipedia [Reference 1.
The second method is the recursive method (also known as the matrix marking method). Through analysis, we can calculate f [I, j] in a two-dimensional matrix, the above recursive formula can be seen as the calculation recursive formula of the matrix unit. As long as the matrix is filled up, the value of F [La-1, lb-1] requires the minimum editing distance, the specific process will not be described here. For more information, see [Reference 2]. There are also many apps on the Internet.
[Document 3] provides the three versions of programs mentioned above.
[Supplement]
Finally, let's add the definition of "string similarity", as mentioned in section 3.3 of "Beauty of programming": similarity expressed as 1/(minimum editing distance + 1 ), for example, the minimum editing distance between abcdefg and abcdef is 1 (delete g through abcdefg), and the similarity is 1/2 = 0.5.
[References]
Http://en.wikipedia.org/wiki/Levenshtein_distance
Http://www.cnitblog.com/ictfly/archive/2005/12/27/5828.aspx
Http://blog.163.com/kevinlee_2010/blog/static/16982082020111123111835146/