Which of the following is a better Chinese word segmentation solution based on Python?

Source: Internet
Author: User
Php Chinese network (www.php.cn) provides the most comprehensive basic tutorial on programming technology, introducing HTML, CSS, Javascript, Python, Java, Ruby, C, PHP, basic knowledge of MySQL and other programming languages. At the same time, this site also provides a large number of online instances, through which you can better learn programming... Reply: "Jieba" Chinese Word segmentation: The best Python Chinese word segmentation component "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best Python Chinese word segmentation module.


Https://github.com/fxsjy/jieba
Pynlpir, the encapsulation of ictclas, is currently in use, with good speed and accuracy ~ I wrote two word segmentation programs. one is based on mmseg and the other is based on CRF. Currently, pypi has been uploaded.
Pip install scseg
Pip install genius I have never used a word segmentation dictionary in python or any other language, but I have seen several Chinese word segmentation dictionaries of python on OSChina, some of which were mentioned by previous friends, let me turn the link around:
  • Http://www.oschina.net/project/tag/264/segment? Sort = view & lang = 25 & OS = 0
Python calls the c library and can use the word segmentation of the Chinese Emy of Sciences. it feels okay, that is, there is a failure rate in the imported user-defined dictionary, and the reason is that it cannot be debugged.

Today, I just gave a simple test on the Chinese word segmentation of four python versions.
Http://hi.baidu.com/fooying/item/6ae7a0e26087e8d7eb34c9e8 Smallseg;
Lightweight and easy to use.
Jieba passed... The two word segmentation packages of the Chinese Emy of sciences and Harbin Institute of Technology are quite good.
Someone made a summary on the internet and sent it for your reference.
Comparison of several open-source word splitting tools There is an mmseg for python word segmentation program that I have never written in python.

Word segmentation is computation-intensive and requires a large amount of computing. python is not suitable for this scenario. you can consider using python to call the c library.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.