[Algorithm] string editing distance

Source: Internet
Author: User

A question about the beauty of Programming

Many programs use strings in large quantities. For different strings, we hope to be able to determine their similar programs. We define a set of operation methods to make the two strings different from each other the same. The specific operation method is:

1. modify a character (for example, replace "A" with "B ");

2. Add a character (for example, change "abdd" to "aebdd ");

3. delete a character (for example, change "traveling" to "traveling ");

For example, for the "abcdefg" and "abcdef" strings, we think we can increase/decrease a "G" to achieve the goal. The preceding two solutions only need to be used once. The number of times required for this operation is defined as the distance between two strings, and the similarity is equal to the reciprocal of "distance + 1. That is to say, the distance between "abcdefg" and "abcdef" is 1, and the similarity is 1/2 = 0.5. Here, we only need to consider the string editing distance.


Analysis and Solution of the original text

It is not hard to see that the distance between two strings must not exceed the sum of their lengths (we can convert both strings into empty strings through the delete operation ). Although this conclusion does not help the result, we can at least know that the distance between any two strings is limited.

We still need to focus on how we can turn this problem into a smaller subproblem. If there are two strings a = xabcdae and B = xfdfa, their first character is the same, as long as a [2 ,..., 7] = abcdae and B [2 ,..., 5] = FDFA distance. However, if the first character of the two strings is different, you can perform the following operations (Lena and lenb are the length of string a and string B respectively ).

1. Delete the first character of string a, and then calculate the distance between a [2,..., Lena] and B [1,..., lenb.

2. Delete the first character of string B, and then calculate the distance between a [1,..., Lena] and B [2,..., lenb.

3. modify the first character of string a to the first character of string B, and then calculate a [2 ,..., lena] and B [2 ,..., lenb.

4. modify the first character of string B to the first character of string a, and then calculate a [2 ,..., lena] and B [2 ,..., lenb.

5. add the first character of string B before the first character of string a, and then calculate a [1 ,..., lena] and B [2 ,..., lenb.

6. add the first character of string a before the first character of string B, and then calculate a [2 ,..., lena] and B [1 ,..., lenb.

In this question, we do not care what the strings are after the two strings become equal. Therefore, you can merge the above six operations:

1. After one step, convert a [2,..., Lena] and B [1,..., lenb] into a phase string.

2. After one step, convert a [2,..., Lena] and B [2,..., lenb] into a phase string.

3. After one step, convert a [1,..., Lena] and B [2,..., lenb] into a phase string.

If you are familiar with dynamic planning, it is easy to see that it is best to use dynamic planning here. If you use recursion, many subproblems will be computed repeatedly.

#include <iostream>using namespace std;const int maxSize = 50;unsigned int dp[maxSize][maxSize];unsigned int dist(char* s1, int len1, char* s2, int len2){    if(!len1){        return len2;    }    if(!len2){        return len1;    }    for(int i = 0; s2[i]; ++i){        dp[0][i+1] = i+1;    }    for(int i = 0; s1[i]; ++i){        dp[i+1][0] = i+1;    }    unsigned int t;    for(int i = 0; s1[i]; ++i){        for(int j = 0; s2[j]; ++j){            t = ~0;            if(s1[i] == s2[j]){                t = dp[i][j];            }            else{                if(t > dp[i][j+1]+1){                    t = dp[i][j+1]+1;                }                if(t > dp[i+1][j]+1){                    t = dp[i+1][j]+1;                }            }            dp[i+1][j+1] = t;        }    }    return dp[len1][len2];    } 

See http://www.cnblogs.com/zhengyuhong/p/3645059.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.