Jieba is a search library under Python, someone has migrated this library to the ASP. NET platform, can completely replace the lucene.net and Pangu participle collocation
The reason why write this, actually because yesterday interview, was asked to the website keyword search How do you do? I just said it. SQL fuzzy Query and SQL statement optimization, caching. Previous contact with the keyword participle, but there is no mature word search library in the. NET platform, unlike Java has Lucene, although also ported to. NET, but the update is slow. Before I learned Python, I noticed Python's word search and the word cloud, and wondered if there was a Python word retrieval library ported to. NET to check out the Python Jieba library.
The original introduction: Jieba Chinese participle. NET version: Jieba.net
. NET platform on the common sub-phrase pieces are pangu participle, but has not been updated for a long time. The most obvious is the built-in dictionary, Jieba's dictionary has 500,000 entries, and Pangu's dictionary is 170,000, which will result in a significantly different word segmentation effect. In addition, for the non-login word, Jieba "adopts the HMM model based on Chinese characters ' ability, using the Viterbi algorithm", the effect looks good.
Code Address Github:https://github.com/anderscui/jieba.net
We can search for downloads directly in VS2013 's NuGet Package Manager:
See the comments inside someone said, will work letter Maiden monthly through subordinate departments to tell the 24 switch and other technical device installation work participle test, can be divided well, I tested the next:
varSegmenter =NewJiebasegmenter (); Console.WriteLine ("Original Search statement: Work Letter Virgo officer every month through subordinate departments to explain the 24-port switch and other technical device installation work"); varSegments1 = Segmenter. Cut ("Work Letter Virgo officer every month through subordinate departments have to tell the 24-port switch and other technical device installation work", Cutall:true); Console.WriteLine ("[Full mode]: {0}",string. Join ("/ ", Segments1)); varSegments2 = Segmenter. Cut ("Work Letter Virgo officer every month through subordinate departments have to tell the 24-port switch and other technical device installation work");//default to exact modeConsole.WriteLine (""exact mode": {0}",string. Join ("/ ", segments2)); varSegments3 = Segmenter. Cut ("Work Letter Virgo officer every month through subordinate departments have to tell the 24-port switch and other technical device installation work");//The default is the exact mode, and the HMM model is also usedConsole.WriteLine (""New word recognition": {0}",string. Join ("/ ", Segments3)); varSEGMENTS4 = Segmenter. Cutforsearch ("Work Letter Virgo officer every month through subordinate departments have to tell the 24-port switch and other technical device installation work");//Search engine ModeConsole.WriteLine (""search engine mode": {0}",string. Join ("/ ", SEGMENTS4)); varSEGMENTS5 = Segmenter. Cut ("Work Letter Virgo officer every month through subordinate departments have to tell the 24-port switch and other technical device installation work"); Console.WriteLine (""Ambiguity cancellation": {0}",string. Join ("/ ", SEGMENTS5)); Console.read ();
Operation Result:
Well, except for the full pattern, the rest of us will be able to meet the order we read.
Chinese Word segmentation Search Tool under ASP.-Jieba.net