Jieba is a good Chinese word segmentation third-party library
Chinese text needs to get a single word through participle
Jieba is a good Chinese word segmentation third-party library, requires additional installation (PIP install Jieba)
The Jieba library provides three types of word breakers, the simplest of which is to master only one function
Jieba participle principle
Using a Chinese thesaurus to determine the relationship probabilities between Chinese characters
The large probability of the composition of Chinese characters, the formation of segmentation results
In addition to participle, users can also add custom phrases
Three modes of Jieba participle
Precision mode, full mode, search engine mode
Precise mode: Cut the text exactly, without redundant words (most commonly)
Full mode: Scan all possible words in the text for redundancy
Search engine mode: On the basis of the precise mode, the long word again segmentation
Jieba Library Common functions:
Jieba.lcut (s) precision mode, returns the word breaker result for a list type l--> list cut-the word type precision mode
Jieba.lcut (s,cut_all=true) Full mode, returns a list type of word breaker, redundancy exists
Jieba.lcut_for_search (s) search engine mode, returns a list type of participle result, there is redundancy
Jieba.add_word (w) Add new words to Word-breaker dictionary W
Python third-party library ____jieba