Dynamic Planning 4-editing distance issues

Source: Internet
Author: User

Given two strings s and T, we allow three operations for T:


(1) Add any character in any position
(2) Delete any characters that exist
(3) Modify any character
Q. How many times can I change the string T to s?
Example: s= "ABCF" T = "DBFG"
Then we can

(1) Change D to a
(2) Delete g
(3) Add C
So the answer is 3. Analysis: This minimum number of operations is often referred to as the editing distance. The "editing distance" once itself has the shortest meaning in the inside. Because the topic has the "shortest" such keywords, the first thing we think of is BFS. Yes, when the distance of S is M and T is the distance of N, we can find the bounds of this number of operations:
(1) The T-word embox deleted, and then add all the characters of S, the number of operations M + N.
(2) The character of the T is deleted or added to M, and then the maximum number of changes in operation |n–m| + M.

Although we have found such an upper bound, BFS is impractical from a practical point of view, because the search space is exponential, depending on the type of character in S-the specific order of magnitude is bad estimate. This problem is difficult, it is difficult to have "add" "delete" such operations, very troublesome. Let's try to understand the problem in a different way and see it as a string alignment problem, in fact, from the point of view of bioinformatics, we can understand this problem.
Given the string s and T, we can use a special character to facilitate the alignment of two strings. Our special characters are "-", we allow the addition of the special characters in S and T to make it the same length, and then let the two strings "alignment", the final two strings in the same position there are different characters, the deduction of 1 points, we want to make these two pairs of snap points as little as possible.

For example we have actually taken such an alignment:

12345
abcf-
Db-fg

Note: If you want to align, the two "-" relative is meaningless, so we do not want this to happen.
Then look at:
(1) s,t corresponding position are ordinary characters, the same, do not deduct points. For example location 2,4
(2) S,t corresponding position are ordinary characters, different, then buckle 1 points. For example Location 1
(3) s In this position is a special character, t at that position is the ordinary character, then buckle 1 points, such as position 5
(4) s In this position is the ordinary character, T in the position is a special character, then buckle 1 points, such as position 3

Let's see what the deduction points correspond to.
(1) Do not deduct points, direct correspondence
(2) Modify the character of the corresponding position in T
(3) The corresponding deletion of the character in T
(4) The corresponding character is added to T

Well, the goal is clear, does it feel like LCS? Let's try it:
Set F (i,j) indicates the minimum deduction of the first I bit of S and the first J bit of T.

So let's take a look at the last one, the alignment situation.
(1) must s[i] = = T[j], then the former i–1 and j–1 have been aligned, this part must be at least deducted points. In this case the minimum deduction is F (i-1,j-1)
(2) and (1) similar, s[i]≠t[j], in which case the minimum deduction is F (i-1, j–1) + 1
(3) S of the former I and T's Front (j–1) bit has been aligned, this part of the deduction is also the least. In this case the minimum deduction is F (i,j-1) + 1
(4) The front (i-1) bit of S has been aligned with the former J-position of T, which is the least part of the deduction. In this case the minimum deduction is F (i,j-1) + 1

The specific F (i,j) takes what value, obviously is to see which kind of situation is the least deduction. For convenience, we define the function same (I,J) to indicate if s[i] = = T[j] is 0, otherwise 1.

Let's take a look at the recursive type:

F (i,j) = min (f (i–1, j–1) + Same (i,j), F (i–1,j) + 1, f (i, j–1) + 1)

What is the initial value?
F (0, j) = J
F (i, 0) = i
This is because for the first 0 bits of s, we can only add "-", or delete all of the T. Similarly, for the first 0 bits of T, we can only add the characters of s and have no choice.
Note that the coincident point f (0,0) = 0 of the above two formulas also conforms to our definition and is not contradictory.

The complexity of time? O (M * n), spatial complexity? O (M * n). Similarly, we find that f (i,j) is only related to the bank and the previous line, we can eliminate one-dimensional space complexity, so as to achieve O (n).
Optimized pseudo-code:
For j = 0 to n does f[j] = jendforfor i = 1 to m does last = f[0] f[0] = i-j = 1 to n do temp = f[i,j] F[i,j] = min (last + Same (i,j), temp + 1, f[j–1] + 1) last = F[i,j] Endforendfor

Note: Our order for I actually update J is by small arrival, so we need to save "old" f[i-1,j–1]. Finally, we provide input and output data, you write a program, the implementation of this algorithm, only write the correct program, to continue the course behind. input
Line 1th: String A (length of a <= 1000). Line 2nd: string B (length of B <= 1000).
Output
Edit distance of output A and B
Input Example
Kittensitting

Sample Output
3
Please choose your familiar language, and in the following code box to complete your program, note the data range, the end result will cause Int32 overflow, this will output the wrong answer. See the following language descriptions for how input and output are handled by different languages.

Dynamic Planning 4-editing distance issues

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.