Source: http://blog.donews.com/windshow/category/70837.aspx
1. Http://www.chinesecomputing.com/nlp/segment.html
This link describes many word Splitting resources. Check the second item (a simplified Chinese segmenter written in Perl ). There is a simplified Chinese word segmentation program for Perl and Java, completely free. I tried it and it worked well. Many people on the internet use the ictclas api of the Chinese Emy of Sciences to add Chinese word segmentation to Lucene. The ICTCLAS of the Chinese Emy of Sciences is developed using C ++. Therefore, after being packaged with JNI, there are hundreds of problems in word segmentation, which is very unstable. At that time, I used this interface as a small DD in the lab. It was encapsulated by Chen Tian of Beijing Normal University. Word Segmentation often causes problems. Of course, the responsibility is not Chen Tian. I also wrote an article on how to add a Chinese word segmentation program in Lucene to introduce how to use ICTCLAS to add Chinese Word Segmentation in Lucene. Later, many readers sent me an email to discuss the problem. In fact, I sometimes have problems. Here you can use the crazy ictclas I recommend to replace the free and unusable JNI package.
However, I did not test multithreading, but I used it by the way. Which of the following experts tried to use it? Don't forget to tell me.
2. Http://www.fajava.cn/products_01.asp
We recommend that you use the third-generation Intelligent Word Segmentation System donews (the 3rd generation word segmenter ). It is said to be the commercial version of ictclas3.0. See: http://www.fajava.cn/products_01.asp provides APIs in Linux/Windows for trial. This is a message from someone else on the blog. I have never tried it.
3Free version of Chinese Word Segmentation(Nice thing)
4.
Chinese Lexical Analysis System of the institute of Computing Technology of the Chinese Emy of Sciences ICTCLAS5.
Massive smart word segmentation research Edition
6
. CSW Chinese Intelligent Word Segmentation component
7.
C # Chinese Word Segmentation component written