Text: Library of Chinese word division based on MMSEG algorithm
Recently in the implementation of Lucene.Net-based search scheme, involving Chinese word segmentation, found a lot, eventually chose Mmseg4j, but mmseg4j only Java version, found in the blog Park * Wang Squire * (http://www.cnblogs.com/land/ archive/2011/07/19/mmseg4j.html) is based on the Java version of the translation code, but it does not support the latest lucene.net 3.0.3, so based on its code upgrade to the latest version of Lucene.Net (≥3.0.3), It also modifies most of the Java-style code to. NET style and fixes several minor bugs.
To make it easier for everyone to use, I put the modified code on GitHub and included a simple sample code. In addition, to facilitate the use of the NuGet installation package, uploaded to NuGet, when used, the direct NuGet search Lucene.Net.Analysis.MMSeg can be.
git address
Https://github.com/JimLiu/Lucene.Net.Analysis.MMSeg
NuGet Address
https://nuget.org/packages/Lucene.Net.Analysis.MMSeg/
Pm> Install-package Lucene.Net.Analysis.MMSeg
Use
There are three different search modes to choose from:
Simpleanalyzer
Analyzer Analyzer = New Simpleanalyzer ();
Maxwordanalyzer
Analyzer Analyzer = New Maxwordanalyzer ();
Complexanalyzer
Analyzer Analyzer = New Complexanalyzer ();
Refer to the examples in the code and Lucene.Net's documentation for specific usage
Chinese division of speech based on MMSEG algorithm