Original address: http://www.2cto.com/kf/201202/121170.html
Fuzzy search is often used when we are doing data systems, but the fuzzy search provided by the database does not have the ability to sort by relevance.
Now provides a way to compare two string similarity.
By calculating the similarity of two strings, it is possible to sort and filter the data in memory by LINQ, selecting one of the most similar results for the target string.
The similarity calculation formula used in this time is similarity degree =kq*q/(kq*q+kr*r+ks*s) (Kq > 0, kr>=0,ka>=0)
Where q is the total number of words that are present in string 1 and string 2, S is the total number of words that exist in string 1, not present in String 2, R is the total number of words that exist in string 2 and do not exist in string 1. Kq,kr and Ka are the weights of q,r,s respectively, according to the actual calculation, we set up Kq=2,kr=ks=1.
Based on this similarity calculation formula, the following program code is obtained:
<summary>
Get the similarity of two strings
</summary>
<param name= "Sourcestring" > First string </param>
<param name= "str" > Second string </param>
<returns></returns>
public static Decimal Getsimilaritywith (This string sourcestring, String str)
{
Decimal Kq = 2;
Decimal Kr = 1;
Decimal Ks = 1;
char[] ss = Sourcestring.tochararray ();
char[] st = str. ToCharArray ();
Get intersection quantity
int q = ss. Intersect (ST). Count ();
int s = ss. Length–q;
int r = St. Length–q;
return Kq * Q/(KQ * q + Kr * r + Ks * s);
}
This is the method of calculating the similarity of strings, but in practice, it is also necessary to take into account synonyms or synonyms, such as "the fastest-changing reading of love-making novels" and "the fastest-updated reading of love-making people". Two strings are, in a sense, the same, and if calculated using the above method, it will be inaccurate. So in practical applications, we need to replace synonyms or synonyms, and calculate the similarity after substitution.
If it is a synonym, we need to replace the results of the former and the synonyms, and get the actual similarity between the two strings.
C # compares the similarity of two strings to "go"