Programming-the beauty of string Similarity Calculation

Source: Internet
Author: User

ManyProgramA large number of strings are used. For different strings, we hope to be able to determine their similarity. We have defined a set of operation methods to make two different strings the same. The specific operation method is:

1. modify a character (for example, replace "A" with "B ");

2. Add a character (for example, change "abdd" to "aebdd ");

3. delete a character (for example, change "traveling" to "traveling ").

For example, for the strings "abcdefg" and "abcdef", we think we can increase/decrease a "G" to achieve the goal. The preceding two solutions only require one operation. The number of times required for this operation is defined as the distance between the two strings, and the similarity is equal to the reciprocal of "distance + 1. That is to say, the distance between "abcdefg" and "abcdef" is 1, and the similarity is 1/2 = 0.5.

Can you write a given string?AlgorithmTo calculate their similarity?

Using the idea of LCS:

Suppose two strings a = {A1, A2, A3,...}, B = {B1, B2, B3 ,...}. Use the DP idea similar to LCS. Set C [I] [J] to the string A1... AI, B1... BJ distance, if AI = BJ, then C [I] [J] = C [I-1] [J-1];

If Ai! = BJ, C [I] [J] = min (C [I-1] [J] + 1, C [I] [J-1] + 1, c [I-1] [J-1] + 1 );

CodeAs follows:

 
Int editdistance (char * dststr, char * srcstr) {char * tmpdststr = dststr; char * tmpsrcstr = srcstr; int C [N] [m]; int I = 0; int J = 0; while (* dststr ++! = '\ 0') I ++; while (* srcstr ++! = '\ 0') J ++; For (INT m = 0; m <j; ++ m) C [0] [m] = m; for (INT n = 0; n <I; ++ N) C [N] [0] = N; For (int m = 1; m <= I; ++ m) for (INT n = 1; n <= J; ++ N) if (tmpdststr [m] = tmpsrcstr [N]) c [m] [N] = M-1] [n-1]; else {C [m] [N] = min (C [M-1] [n-1] + 1, c M-1] [N] + 1); C [m] [N] = min (C [m] [N], c [m] [n-1] + 1);} return C [I] [J];}

Previously calculated: When the longest incrementing sub-sequence in the array is used, the array is sorted first, and then the LCS in the two arrays is located. This can also be used to find LCS, and then max (srcstr, dststr)-LCs. It cannot be seen that LCS is so useful.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.