Levenshtein distance Algorithm: calculate the difference between two strings

Source: Internet
Author: User
Levenshtein distance (LD)-Two string similarity calculation algorithms


There are many ways to calculate the similarity between the two strings. Now we can summarize the similarity calculation based on the linear distance algorithm.

 

Levenshtein Distance (LD): LD may measure the similarity between two strings. Their distance is the addition, deletion, and modification of values during the conversion of a string into that string.


Example:


If str1 = "test", str2 = "test", then LD (str1, str2) = 0. Not converted.

If str1 = "test", str2 = "tent", then LD (str1, str2) = 1. Str1's "s" to "n", converts a character, so it is 1.

The larger the distance, the more different they are.


Levenshtein distance was first invented by Russian scientist Vladimir Levenshtein in 1965 and named after him. It can be called edit distance without spelling ).


Levenshtein distance can be used:


Spell checking)

Speech recognition (statement recognition)

DNA analysis)

Plagiarism detection (Plagiarism detection)

LD stores distance values using m * n matrices. Approximate Algorithm process:


The length of str1 or str2 is 0 and returns the length of another string.

Initialize (n + 1) * (m + 1) matrix d and increase the value of the first row and column from 0.

Scan two strings (n * m). If str1 [I] = str2 [j], use temp to record it as 0. Otherwise, the temp value is 1. Then in the matrix d [I] [j] assigned to d [I-1] [j] + 1, d [I] [J-1] + 1, d [I-1] [J-1] + the minimum value of temp.

After scanning, the last value of the returned matrix is d [n] [m].

The distance is returned. How can we find the similarity based on this distance? Because their maximum distance is the maximum length of the two strings. It is not very sensitive to strings. Now I have set the similarity calculation formula to 1-their distance/maximum String Length.

 

Private Int32 levenshtein (String a, String B)
{
 
If (string. IsNullOrEmpty ())
{
If (! String. IsNullOrEmpty (B ))
{
Return B. Length;
}
Return 0;
}
 
If (string. IsNullOrEmpty (B ))
{
If (! String. IsNullOrEmpty ())
{
Return a. Length;
}
Return 0;
}
 
Int32 cost;
Int32 [,] d = new int [a. Length + 1, B. Length + 1];
Int32 min1;
Int32 min2;
Int32 min3;
 
For (Int32 I = 0; I <= d. GetUpperBound (0); I + = 1)
{
D [I, 0] = I;
}
 
For (Int32 I = 0; I <= d. GetUpperBound (1); I + = 1)
{
D [0, I] = I;
}
 
For (Int32 I = 1; I <= d. GetUpperBound (0); I + = 1)
{
For (Int32 j = 1; j <= d. GetUpperBound (1); j + = 1)
{
Cost = Convert. ToInt32 (! (A [I-1] = B [j-1]);
 
Min1 = d [I-1, j] + 1;
Min2 = d [I, j-1] + 1;
Min3 = d [I-1, j-1] + cost;
D [I, j] = Math. Min (Math. Min (min1, min2), min3 );
}
}
 
Return d [d. GetUpperBound (0), d. GetUpperBound (1)];
 
}

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.