C # Compare the similarity between two strings,

Source: Internet
Author: User

C # Compare the similarity between two strings ],

Address: http://www.2cto.com/kf/201202/121170.html

Fuzzy search is often used in data systems. However, fuzzy search provided by databases does not support sorting by relevance.

Now we provide a method to compare the similarity between two strings.
By calculating the similarity between the two strings, you can sort and filter the data in the memory using Linq, and select the result most similar to the target string.
 
The similarity calculation formula used this time is similarity = Kq * q/(Kq * q + Kr * r + Ks * s) (Kq> 0, Kr> = 0, ka> = 0)
Q indicates the total number of words in string 1 and string 2, s indicates the total number of words in string 1, and r indicates the number of words in string 2, the total number of words that do not exist in string 1. kq, Kr, and ka are the weights of q, r, and s respectively. Based on the actual calculation, we set Kq = 2, Kr = Ks = 1.
Based on the similarity calculation formula, the following code is obtained:
/// <Summary>
/// Obtain the similarity between two strings
/// </Summary>
/// <Param name = "sourceString"> the first string </param>
/// <Param name = "str"> second string </param>
/// <Returns> </returns>
Public static decimal GetSimilarityWith (this string sourceString, string str)
{

Decimal Kq = 2;
Decimal Kr = 1;
Decimal Ks = 1;

Char [] ss = sourceString. ToCharArray ();
Char [] st = str. ToCharArray ();

// Obtain the number of intersections
Int q = ss. Intersect (st). Count ();
Int s = ss. Length-q;
Int r = st. Length-q;

Return Kq * q/(Kq * q + Kr * r + Ks * s );
}

 
This is the method used to calculate string similarity. However, in actual application, you must consider the occurrence of synonyms or synonyms, for example, "The fastest update of love novels" and "The fastest update of love novels ". The two strings are actually the same in a certain sense. If you use the above method for computation, the results will be inaccurate. Therefore, in actual application, we need to replace synonyms or synonyms and calculate the similarity after replacement.
If it is a synonym, You need to replace the calculation results of the synonym before and after the synonym to obtain the actual similarity between the two strings.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.