Open-source: Calculate a fingerprint for each document and then use the fingerprint for similarity calculation
TextsimilarityTextsimilarity =
New Textsimilarity();
// ComputingArticleSimilarity fingerprint
IntSourcefingerprint = textsimilarity. calctextfingerprint (sourcetext );
IntDestfingerprint = textsimilarity. calctextfingerprint (desttext );
// Compare the fingerprint to calculate the similarity
VaRSimilarity = textsimilarity. calctextsimilarity (sourcefingerprint, destfingerprint );
......
Let's analyze Baidu news by the way.
How many identical news entries are shown in the figure?
Let's click here to see how this retrieval command works.
You can see that this command is clearly used to determine whether the fingerprints are the same according to the document. If the fingerprints are the same, the news content must be roughly the same.
Look at this.CodeThe computed fingerprint.
You may feel this way, right?
If you know programming, you can download the code, compile it, and run it.
Http://pan.baidu.com/share/link? Consumer id = 314821 & UK = 201606611
If you are not familiar with programming, you can download, install, and run it!
Http://pan.baidu.com/share/link? Consumer id = 314822 & UK = 201606611
Please contact me if you have any questions: 74965947,721 33568,27236303, 16592133,204725117, 204724518