Recently, I was doing text matching. I thought of the feature value algorithm and wrote a text computing algorithm myself. Criticize.

Source: Internet
Author: User

The previous requirement was to compare whether the two texts were similar.

 

The most awesome method is to use Semantic Analysis and then compare the results. However, this performance is not flattering, So I think of the face recognition I have done before and use feature values for operations.

 

When face recognition is used, the image is converted into a one-dimensional vector, and then the feature value is calculated. The final result shows that if the feature value is calculated by text, after the text string is input, it is simply simplified to the auto-multiplication of each char, and then the sum is equal to the square of the modulo of the vector.

 

For example:

 

Hi = (char 72) (char 73 ).

Eigenvalue = 72*72 + 73*73.

 

In this way, the result of IH is the same as that of HI. Another problem is that char ranges from 0 ~ 255. If you detect the text, the length of the double will soon be exceeded. To address these problems, I think ofAlgorithm.

 

Code Public   Static   Double Getstringeigenvalue ( String Value)
{
If ( String . Isnullorempty (value ))
Return   0 ;

DoubleLength=Value. length;

DoubleEigenvalue= 0;

int index = 1 ;

foreach ( char STR in value)
{< br> double strnum = ( double ) STR;

Eigenvalue+ =(Strnum*Strnum)/ 100000 *Index++;
}

// Calculates the value that increases the final value because the weight is added to each position.

Double Weightfactor = (Length -   1 ) *   255   *   255   /   100000   + (Length -   1 ) * (Length -   2 ) *   255   *   255   /   2   /   100000 ;

If(Weightfactor= 0)
Weightfactor= 1;

Weightfactor=Math. Ceiling (math. log10 (weightfactor ));

If(Weightfactor= 0)
Weightfactor= 1;

Eigenvalue/=Weightfactor;

Eigenvalue/=Length;

ReturnLength+Eigenvalue;
}

 

 

 

The principle is simple, an idiot.

 

1. the char value range is 0 ~ 255. Then the maximum value of char multiplication = 255*255 = 65025. In order to keep the feature value in decimal places every time, I divide it by 10000.

Eigenvalue + = (Strnum * Strnum) /   100000   * Index ++

 

2. Considering the character string position, I multiply each character string by the position weight index, so that the results of different strings will be different.

Eigenvalue + = (Strnum * Strnum) /   100000   *  Index ++

 

3. However, because the weight is added during each character operation, the result is much greater than that without weight. How big is it? I will perform the following operations based on the arithmetic difference sequence 65025:

N * 65025 + N (N-1) * 65025/2, n = length-1

Double Weightfactor = (Length -   1 ) *   255   *   255   /   100000   + (Length -   1 ) * (Length -   2 ) *   255   *   255   /   2   /   100000 ;

4. Finally, I will calculate the power of the additional value of 10 (the amount to be reduced ). Get:

Weightfactor=Math. Ceiling (math. log10 (weightfactor ));

 

5. Then, according to the magnification of the weight, the feature value is reduced back to the decimal part:

Eigenvalue/=Weightfactor;

 

6. Because the feature value result is the char multiplication at each position and then the addition, the magnification is equal to a multiple of 255*255/100000 * length. So it is length. Eventually, it needs to be reduced to decimal places:

Eigenvalue/=Length; 

 

7. In order to ensure that the calculated results may be the same for strings of different lengths, the final result will be added with the length of the string.

ReturnLength+Eigenvalue; 

 

The final value represents the feature value of the text. The integer part is the length, and the fractional part is the similarity.

 

Finally, let's take a shoot .... The younger brother is ugly here because he is not good at mathematics.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.