Calculate string similarity (distance)-beauty of Programming

Source: Internet
Author: User

I recently looked at the beauty of programming. This question is quite good and has transformed into a problem.

Many programs use strings in large quantities. For different strings, we hope to be able to determine their similarity. We have defined a set of operation methods to make the two strings different from each other the same. The specific operation method is:

1. modify a character (for example, replace "A" with "B ").
2. Add a character (for example, change "abdd" to "aebdd ").
3. delete a character (for example, change "traveling" to "traveling ").
For example, for the "abcdefg" and "abcdef" strings, we think we can increase/decrease a "G" to achieve the goal. The preceding two solutions only require one operation. Define the number of times required for this operation as two stringsDistanceGiven any two strings, can you write an algorithm to calculate their distance?(Similarity is the reciprocal of "distance + 1).

Analysis and Solution

It is not hard to see that the distance between two strings must not exceed the sum of their lengths (we can convert both strings into empty strings through the delete operation ). Although this conclusion does not help the result, we can at least know that the distance between any two strings is limited.
We should still focus on how we can turn this problem into a smaller one. If there are two strings a = xabcdae and B = xfdfa, their first character is the same, as long as a [2 ,..., 7] = abcdae and B [2 ,..., 5] = FDFA distance. However, if the first character of the two strings is different, you can perform the following operations (Lena and lenb are the length of string a and string B respectively ):
1. Delete the first character of string a and calculate a [2 ,..., Lena] and B [1 ,..., Lenb.
2. Delete the first character of string B and calculate a [1 ,..., Lena] and B [2 ,..., Lenb.
3. Modify the first character of string a to the first character of string B, and then calculate a [2 ,..., Lena] and B [2 ,..., Lenb.
4. Modify the first character of string B to the first character of string a, and then calculate a [2 ,..., Lena] and B [2 ,..., Lenb.
5. Add the first character of string B before the first character of string a, and then calculate a [1 ,..., Lena] and B [2 ,..., Lenb.
6. Add the first character of string a to the first character of string B, and calculate a [2 ,..., Lena] and B [1 ,..., Lenb.

In this question, we do not care what the strings are after the two strings become equal. Therefore, you can merge the above six operations:
1. After one step of operation, replace a [2 ,..., Lena] and B [1 ,..., Lenb] to the same string.
2. After one step of operation, replace a [1 ,..., Lena] and B [2 ,..., Lenb] to the same string.
3. After one step of operation, replace a [2 ,..., Lena] and B [2 ,..., Lenb] to the same string.

In this way, a recursive program can be completed quickly.

Which does not need to be converted:

F (I, j) = f (I + 1, J + 1 );

In the steps to convert:

F (I, j) = min (f (I + 1, J + 1) + 1, F (I + 1, J) + 1, F (I, J + 1) + 1 );

The following is a simple example of why I use C.

#include<stdio.h>#include<stdlib.h>#include<string.h>int calculate(char const *a,int astart,int aend,char const *b,int bstart,int bend){        if(astart>aend)        {                if(bstart>bend)                        return 0;                else                        return bend-bstart+1;        }        if(bstart>bend)        {                if(astart>aend)                {                        return 0;                }                else                        return aend-astart+1;        }        if(a[astart]==b[bstart])        {                return calculate(a,astart+1,aend,b,bstart+1,bend);        }        else        {                int t1=calculate(a,astart+1,aend,b,bstart,bend)+1;                int t2=calculate(a,astart,aend,b,bstart+1,bend)+1;                int t3=calculate(a,astart+1,aend,b,bstart+1,bend)+1;                return t3<(t1<t2?t1:t2)?t3:(t1<t2?t1:t2);        }}int main(){        char *a="aqa";        char *b="qaa";        printf("a=%s\n",a);        printf("b=%s\n",b);        int dis=calculate(a,0,strlen(a)-1,b,0,strlen(b)-1);        printf("distance between a and b is :%d\n",dis);        return 0;}                                              

In fact, we can find that there are a lot of repeated computations that can be optimized. The optimized code will be pasted in a few days:

Record the answers to the computed subquestions using a result array to avoid repeated computation.

Result Matrix:

#include<stdio.h>#include<stdlib.h>#include<string.h>int max(int a, int b){        return a>=b? a:b;}int calculate(char const *a,int astart,int aend,char const *b,int bstart,int bend,int **result){        if(astart>aend||bstart>bend)        {                return max(aend-astart+1,bend-bstart+1);        }        if(result[astart][bstart] >= 0)                return result[astart][bstart];        if(a[astart]==b[bstart])        {                return result[astart][bstart]=calculate(a,astart+1,aend,b,bstart+1,bend,result);        }        else        {                int t1,t2,t3;                t1=calculate(a,astart+1,aend,b,bstart,bend,result)+1;                t2=calculate(a,astart,aend,b,bstart+1,bend,result)+1;                t3=calculate(a,astart+1,aend,b,bstart+1,bend,result)+1;                return result[astart][bstart] = t3<(t1<t2?t1:t2)?t3:(t1<t2?t1:t2);        }}int main(){        int i,j;        char a[100];        char b[100];        gets(a);        gets(b);        printf("a=%s\n",a);        printf("b=%s\n",b);        int **result=(int **)malloc((strlen(a))*sizeof(int*));        if(result==NULL) {printf("calloc error");return 1;}        for(i=0;i<strlen(a);i++)        {                result[i]=(int*)malloc((strlen(b)+100)*sizeof(int));                if(result[i]==NULL){printf("calloc error");return 1;}        }        for(i=0;i<strlen(a);i++)        {                for(j=0;j<strlen(b);j++)                {                        result[i][j]=-1;                }        }        int dis=calculate(a,0,strlen(a)-1,b,0,strlen(b)-1,result);        printf("\nresult:\n");        for(i=0;i<strlen(a);i++)        {                for(j=0;j<strlen(b);j++)                {                        printf("%4d",result[i][j]);                }                printf("\n");        }        printf("distance between a and b is :%d\n",dis);        return 0;}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.