C # Compare the similarity between two strings ],
Address: http://www.2cto.com/kf/201202/121170.html
Fuzzy search is often used in data systems. However, fuzzy search provided by databases does not support sorting by relevance.
Now we provide a method to compare the similarity between two strings.
By calculating the similarity between the two strings, you can sort and filter the data in the memory using Linq, and select the result most similar to the target string.
The similarity calculation formula used this time is similarity = Kq * q/(Kq * q + Kr * r + Ks * s) (Kq> 0, Kr> = 0, ka> = 0)
Q indicates the total number of words in string 1 and string 2, s indicates the total number of words in string 1, and r indicates the number of words in string 2, the total number of words that do not exist in string 1. kq, Kr, and ka are the weights of q, r, and s respectively. Based on the actual calculation, we set Kq = 2, Kr = Ks = 1.
Based on the similarity calculation formula, the following code is obtained:
/// <Summary>
/// Obtain the similarity between two strings
/// </Summary>
/// <Param name = "sourceString"> the first string </param>
/// <Param name = "str"> second string </param>
/// <Returns> </returns>
Public static decimal GetSimilarityWith (this string sourceString, string str)
{
Decimal Kq = 2;
Decimal Kr = 1;
Decimal Ks = 1;
Char [] ss = sourceString. ToCharArray ();
Char [] st = str. ToCharArray ();
// Obtain the number of intersections
Int q = ss. Intersect (st). Count ();
Int s = ss. Length-q;
Int r = st. Length-q;
Return Kq * q/(Kq * q + Kr * r + Ks * s );
}
This is the method used to calculate string similarity. However, in actual application, you must consider the occurrence of synonyms or synonyms, for example, "The fastest update of love novels" and "The fastest update of love novels ". The two strings are actually the same in a certain sense. If you use the above method for computation, the results will be inaccurate. Therefore, in actual application, we need to replace synonyms or synonyms and calculate the similarity after replacement.
If it is a synonym, You need to replace the calculation results of the synonym before and after the synonym to obtain the actual similarity between the two strings.