Php Chinese network (www.php.cn) provides the most comprehensive basic tutorial on programming technology, introducing HTML, CSS, Javascript, Python, Java, Ruby, C, PHP, basic knowledge of MySQL and other programming languages. At the same time, this site also provides a large number of online instances, through which you can better learn programming... Reply: "Jieba" Chinese Word segmentation: The best Python Chinese word segmentation component "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best Python Chinese word segmentation module.
Https://github.com/fxsjy/jieba
Pynlpir, the encapsulation of ictclas, is currently in use, with good speed and accuracy ~ I wrote two word segmentation programs. one is based on mmseg and the other is based on CRF. Currently, pypi has been uploaded.
Pip install scseg
Pip install genius I have never used a word segmentation dictionary in python or any other language, but I have seen several Chinese word segmentation dictionaries of python on OSChina, some of which were mentioned by previous friends, let me turn the link around:
- Http://www.oschina.net/project/tag/264/segment? Sort = view & lang = 25 & OS = 0
Python calls the c library and can use the word segmentation of the Chinese Emy of Sciences. it feels okay, that is, there is a failure rate in the imported user-defined dictionary, and the reason is that it cannot be debugged.
Today, I just gave a simple test on the Chinese word segmentation of four python versions.
Http://hi.baidu.com/fooying/item/6ae7a0e26087e8d7eb34c9e8
Smallseg;
Lightweight and easy to use.
Jieba passed... The two word segmentation packages of the Chinese Emy of sciences and Harbin Institute of Technology are quite good.
Someone made a summary on the internet and sent it for your reference.
Comparison of several open-source word splitting tools
There is an mmseg for python word segmentation program that I have never written in python.
Word segmentation is computation-intensive and requires a large amount of computing. python is not suitable for this scenario. you can consider using python to call the c library.