Free source code resources for Chinese Word Segmentation

Source: Internet
Author: User
Source: http://blog.donews.com/windshow/category/70837.aspx

1. Http://www.chinesecomputing.com/nlp/segment.html

This link describes many word Splitting resources. Check the second item (a simplified Chinese segmenter written in Perl ). There is a simplified Chinese word segmentation program for Perl and Java, completely free. I tried it and it worked well. Many people on the internet use the ictclas api of the Chinese Emy of Sciences to add Chinese word segmentation to Lucene. The ICTCLAS of the Chinese Emy of Sciences is developed using C ++. Therefore, after being packaged with JNI, there are hundreds of problems in word segmentation, which is very unstable. At that time, I used this interface as a small DD in the lab. It was encapsulated by Chen Tian of Beijing Normal University. Word Segmentation often causes problems. Of course, the responsibility is not Chen Tian. I also wrote an article on how to add a Chinese word segmentation program in Lucene to introduce how to use ICTCLAS to add Chinese Word Segmentation in Lucene. Later, many readers sent me an email to discuss the problem. In fact, I sometimes have problems. Here you can use the crazy ictclas I recommend to replace the free and unusable JNI package.

However, I did not test multithreading, but I used it by the way. Which of the following experts tried to use it? Don't forget to tell me.

2. Http://www.fajava.cn/products_01.asp

We recommend that you use the third-generation Intelligent Word Segmentation System donews (the 3rd generation word segmenter ). It is said to be the commercial version of ictclas3.0. See: http://www.fajava.cn/products_01.asp provides APIs in Linux/Windows for trial. This is a message from someone else on the blog. I have never tried it.

3Free version of Chinese Word Segmentation(Nice thing)

4. Chinese Lexical Analysis System of the institute of Computing Technology of the Chinese Emy of Sciences ICTCLAS5. Massive smart word segmentation research Edition 6 . CSW Chinese Intelligent Word Segmentation component 7. C # Chinese Word Segmentation component written 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.