Each word breaker in Lucene.Net is a class with an auxiliary class that completes most of the logic of the word breaker. Part of speech ends with analyzer, and auxiliary classes usually end in Tokenizer. Classifiers are all inherited from the analyzer class, and auxiliary classes often inherit a class as well.
First, two classes, Easyanalyzer and Easytokenizer, are created under the Analysis folder.
1 usingLucene.Net.Analysis;2 usingSystem.IO;3 4 namespacelucenenettest5 {6 Public classEasytokenizer:chartokenizer7 {8 PrivateTextReader Reader;9 PublicEasytokenizer (TextReader reader)Ten:Base(reader) One { A This. Reader =Reader; - } - protected Override BOOLIstokenchar (Charc) the { - //return true; //Full Line output - //return c! = ', ';//comma Delimited - returnc! =' ';//Space Division + } - } +}
Easytokenizer
1 usingLucene.Net.Analysis;2 usingSystem.IO;3 4 namespacelucenenettest5 {6 Public classEsayanalyzer:analyzer7 {8 Public OverrideTokenstream Tokenstream (stringFieldName, TextReader Reader)9 {Ten return NewEasytokenizer (reader); One } A } -}
Esayanalyzer
Where the Istokenchar of the word breaker helper class is the key to the word breaker, the core logic of the word breaker will be broken according to the value returned by the function, and if it returns false, it will be divided by this character.
The test code is as follows:
1 usingLucene.Net.Analysis;2 usingLucene.Net.Analysis.Tokenattributes;3 usingSystem;4 usingSystem.IO;5 namespacelucenenettest6 {7 class Program8 {9 Static voidMain (string[] args)Ten { One Const stringTestwords ="Hello, I can speak chinese!"; AEsayanalyzer simple =NewEsayanalyzer (); -Tokenstream ts = simple. Reusabletokenstream ("",NewStringReader (testwords)); - while(TS. Incrementtoken ()) the { -Itermattribute attribute = ts. Getattribute<itermattribute>(); - Console.WriteLine (attribute. term); - } + } - } +}
Program
Itermattribute:the term text of a Token.