Common skills (update ing): Http://www.cnblogs.com/dunitian/p/4822808.html#skill
Skill Master (update ing): http://www.cnblogs.com/dunitian/p/5493793.html
Online Demo: http://cppjieba-webdemo.herokuapp.com
Full demo:https://github.com/dunitian/tempcode/tree/master/2016-09-05
First of all, pay attention, stutter participle he did not go to the word breaker once, we have to do it ourselves; Dictionaries have to be configured or set to output to the bin directory
Application Scenario Example (search for that piece everybody knows, say something else)
——————————————————————————————————————————————————
To the point: Look at a group of folk statistics: (Non-net version, refers to the official version)
NET version of Ikanalyzer and Pangu participle has not been updated for many years, so the choice of stuttering participle ( this name is also very consistent with the artistic conception of participle ~ ~ Stuttering to speak, is it a way of participle?) )
Here's a quick demo:
1. Introduce the package first:
2. Dictionary settings:
3. Simple packaged Help class:
Using system.linq;using jiebanet.segmenter;using system.collections.generic;namespace LoTLib.Word.Split{#region Type public enum Jiebatypeenum {//<summary>/////////////////---The most basic and natural pattern, trying to cut the sentence to the most precise, suitable for text analysis// /</summary> Default,////<summary> Full mode---words can be scanned, faster, but not ambiguous///< /summary> Cutall,///<summary>////////Search engine mode---on the basis of accurate mode to re-slice long words, improve recall rate, suitable for search engine participle///&L T;/summary> Cutforsearch,///<summary>///Precision mode-without HMM///</summary> Ot Her} #endregion///<summary>///stutter participle///</summary> public static partial class Wordspli Thelper {//<summary>////For string collection after word breaker//</summary>//<param name= "obj Str "></param>//<param name=" type "></param>//<returns></returns> public static Ienumerable<string> getsplitwords (String objstr, Jiebatypeenum type = Jiebatypeenum.default) {var Jieba = new J Iebasegmenter (); Switch (type) {case JiebaTypeEnum.Default:return Jieba. Cut (OBJSTR); Precision mode-with HMM case JiebaTypeEnum.CutAll:return Jieba. Cut (Objstr, cutall:true); Full mode case JiebaTypeEnum.CutForSearch:return Jieba. Cutforsearch (OBJSTR); Search engine mode Default:return Jieba. Cut (Objstr, False, false); Precision mode-Without Hmm}}//<summary>//////for string after Word///</summary>// <param name= "Objstr" ></param>//<param name= "type" ></param>//<RETURNS>&L t;/returns> public static string Getsplitwordstr (This string objstr, jiebatypeenum type = Jiebatypeenum.default) {var words = getsplitwords (objstr, type); No result returns an empty string if (words = = NULL | | words. Count () < 1) {return string. Empty; } words = words. Distinct ();//Sometimes the words are duplicated, so you have to handle the return string yourself. Join (",", words);//return according to individual needs}}}
The call is simple:
String str = "Bootstrap-datetimepicker further follow ~ ~ ~ Start time and end time style display"; Console.WriteLine ("\ n Precision mode-with hmm:\n"); Console.WriteLine (str. Getsplitwordstr ()); Console.WriteLine ("\ n Full mode: \ n"); Console.WriteLine (str. Getsplitwordstr (Jiebatypeenum.cutall)); Console.WriteLine ("\ n search engine mode: \ n"); Console.WriteLine (str. Getsplitwordstr (Jiebatypeenum.cutforsearch)); Console.WriteLine ("\ n precision mode-without hmm:\n"); Console.WriteLine (str. Getsplitwordstr (Jiebatypeenum.other)); Console.readkey ();
Effect:
--------------------------
One might say, what is the keyword extraction? = = "Don't worry, look at the following:
The dictionary that corresponds to this way is it = "idf.txt
Simply say constants== "
Effect:
Full Help Class (latest look at GitHub): https://github.com/dunitian/TempCode/tree/master/2016-09-05
Using system.linq;using jiebanet.segmenter;using system.collections.generic;using jiebanet.analyser;namespace lotlib.word.split{#region Parts of the type public enum Jiebatypeenum {//<summary>////Precision Mode---most basic and natural mode The best way to cut the sentence, suitable for text analysis//</summary> Default,////<summary>//full-mode---can be word of words are scanned out, Faster, but not ambiguous///</summary> Cutall,///<summary>///Search engine mode---on the basis of accurate mode, the long words are again sliced, Improved recall, suitable for use in search engine segmentation//</summary> Cutforsearch,///<summary>//Precision mode-without HMM </summary> Other} #endregion//<summary>//Stutter participle//</summary> public Static partial class Wordsplithelper {#region Common Series///<summary>/////For string collection after word breaker// /</summary>//<param name= "OBJSTR" ></param>//<param name= "type" ></param> <returns></returns> public static ienumerable<string> getsplitwords (string objstr, jiebatypeenum type = Jiebatypeenum.def Ault) {var Jieba = new Jiebasegmenter (); Switch (type) {case JiebaTypeEnum.Default:return Jieba. Cut (OBJSTR); Precision mode-with HMM case JiebaTypeEnum.CutAll:return Jieba. Cut (Objstr, cutall:true); Full mode case JiebaTypeEnum.CutForSearch:return Jieba. Cutforsearch (OBJSTR); Search engine mode Default:return Jieba. Cut (Objstr, False, false); Precision mode-Without Hmm}}//<summary>//Extract Article keyword Collection///</summary>// <param name= "Objstr" ></param>///<returns></returns> public static Ienumerable<s Tring> getarticlekeywords (String objstr) {var IDF = new Tfidfextractor (); REturn IDF. Extracttags (Objstr, ten, Constants.nounandverbpos);//nouns and verbs}//<summary>////return the stitched string </summary>//<param name= "words" ></param>//<returns></returns> public static string Joinkeywords (ienumerable<string> words) {//No result returns an empty string if (words = = NULL | | Words. Count () < 1) {return string. Empty; } words = words. Distinct ();//Sometimes the words are duplicated, so you have to handle the return string yourself. Join (",", words);//return according to individual needs #endregion #region extension Related//<summary>///Get character after word breaker string//</summary>//<param name= "OBJSTR" ></param>//<param name= "type" >< ;/param>//<returns></returns> public static string Getsplitwordstr (This string objstr, Jie Batypeenum type = jiebatypeenum.default) {var words = GetsplitwoRDS (OBJSTR, type); return joinkeywords (words); }///<summary>//Extract the article keyword string///</summary>//<param name= "Objstr" ></ param>//<returns></returns> public static string Getarticlekeywordstr (this string objstr) {var words = getarticlekeywords (OBJSTR); return joinkeywords (words); } #endregion}}
Stuttering Chinese participle Related:
Https://github.com/fxsjy/jieba
Https://github.com/anderscui/jieba.NET
Http://cppjieba-webdemo.herokuapp.com
The stuttering participle of Chinese participle ~ ~ ~ Use scene +demo