Open-source: Calculate a fingerprint for each document, and then use the fingerprint for similarity calculation, including source code and executable programs

Source: Internet
Author: User

Open-source: Calculate a fingerprint for each document and then use the fingerprint for similarity calculation

TextsimilarityTextsimilarity =
New Textsimilarity();

// ComputingArticleSimilarity fingerprint

IntSourcefingerprint = textsimilarity. calctextfingerprint (sourcetext );

IntDestfingerprint = textsimilarity. calctextfingerprint (desttext );

// Compare the fingerprint to calculate the similarity

VaRSimilarity = textsimilarity. calctextsimilarity (sourcefingerprint, destfingerprint );

......

 

Let's analyze Baidu news by the way.

How many identical news entries are shown in the figure?

Let's click here to see how this retrieval command works.

You can see that this command is clearly used to determine whether the fingerprints are the same according to the document. If the fingerprints are the same, the news content must be roughly the same.


Look at this.CodeThe computed fingerprint.

You may feel this way, right?


If you know programming, you can download the code, compile it, and run it.

Http://pan.baidu.com/share/link? Consumer id = 314821 & UK = 201606611

If you are not familiar with programming, you can download, install, and run it!

Http://pan.baidu.com/share/link? Consumer id = 314822 & UK = 201606611

Please contact me if you have any questions: 74965947,721 33568,27236303, 16592133,204725117, 204724518

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.