Preface before HANLP use "shortest editing distance" to do the recommender, the effect needs to be improved, the main disadvantage is that according to the pinyin sequence of the editing distance recommended, the same word interleaved is very common, and the editing distance is not so large. I was looking for a complementary scoring algorithm to judge how similar the two sentences were to this dimension of pinyin. The difference between the longest common substring (longest Common Substring) refers to the longest common substring in two strings, which requires that the substring must be contiguous. The longest common subsequence (longest Common Substring) refers to the longest common substring in a two string, and does not require continuous substring. The solution of the two is the same as the editing distance, are dynamic planning, with space exchange ...
Continue reading : yards Farm» Java implementation of longest common substring, longest common subsequence and NLP application
original link : http://www.hankcs.com/program/algorithm/ Implementation-and-application-of-nlp-longest-common-subsequence-longest-common-subsequence-of-java.html
Java implementation and NLP application of the longest common substring and the longest common subsequence