Jieba is a search library under Python, someone has migrated this library to the ASP. NET platform, can completely replace the lucene.net and Pangu participle collocation
The reason why write this, actually because yesterday interview, was asked to the website keyword search How do you do? I just said it. SQL fuzzy Query and SQL statement optimization, caching. Previous contact with the keyword participle, but there is no mature word search library in the. NET platform, unlike Java has Lucene, although also ported to. NET, but the update is slow. Before I learned Python, I noticed Python's word search and the word cloud, and wondered if there was a Python word retrieval library ported to. NET to check out the Python Jieba library.
The original introduction: Jieba Chinese participle. NET version: Jieba.net
. NET platform on the common sub-phrase pieces are pangu participle, but has not been updated for a long time. The most obvious is the built-in dictionary, Jieba's dictionary has 500,000 entries, and Pangu's dictionary is 170,000, which will result in a significantly different word segmentation effect. In addition, for the non-login word, Jieba "adopts the HMM model based on Chinese characters ' ability, using the Viterbi algorithm", the effect looks good.
We can search for downloads directly in VS2013 's NuGet Package Manager:
See the comments inside someone said, will work letter Maiden monthly through subordinate departments to tell the 24 switch and other technical device installation work participle test, can be divided well, I tested the next:
var segmenter = new Jiebasegmenter (); Console.WriteLine ("Original search statement: Work Letter Virgo Officer monthly through subordinate departments have to tell the 24-port switch and other technical device installation work"); var segments1 = Segmenter. Cut ("Work Letter Virgo officer every month through subordinate departments to tell the 24-port switch and other technical device installation work", cutall:true); Console.WriteLine ("[Full mode]: {0}", String. Join ("/", segments1)); var segments2 = Segmenter. Cut ("Work Letter Virgo officer every month through subordinate departments to tell the 24-port switch and other technical device installation work"); The default is exact mode Console.WriteLine ("Exact mode": {0} ", String.) Join ("/", segments2)); var segments3 = Segmenter. Cut ("Work Letter Virgo officer every month through subordinate departments to tell the 24-port switch and other technical device installation work"); The default is the exact mode, and also uses the HMM model Console.WriteLine ("New word recognition": {0} ", String. Join ("/", segments3)); var segments4 = Segmenter. Cutforsearch ("Work Letter Virgo officer monthly through subordinate departments have to tell the 24-port switch and other technical device installation work"); Search engine mode Console.WriteLine ("Search engine mode": {0} ", String.) Join ("/", SEGMENTS4)); var segments5 = Segmenter. Cut ("Work Letter Virgo officer every month through subordinate departments to tell the 24-port switch and other technical device installation work"); Console.WriteLine ("Ambiguity cancellation": {0} ",String. Join ("/", SEGMENTS5)); Console.read ();
Operation Result:
Well, except for the full pattern, the rest of us will be able to meet the order we read.