04_java Lucene Learning--participle Analyzer: lucene4.0_ Learn to write a simple Chinese synonym breaker __lucene learning

Source: Internet
Author: User

Spit Groove:

1. This week is busy like a dog, forget to learn, is still in copy other people's code, do not know why the awkward point ...


Description

1. Segmentation of the data flow:reader->tokenizer-> multiple Tokenfilter filter->tokenstream

2. The use of Chinese synonyms, the need for Mmseg4jjar package support, the main use of people's Word breaker (Mmsegtokenizer Class), this I still directly with ready-made.

And then customizing the Tokenizeffilter, and finally customizing a Chinese word breaker mysynonymanalyzer

3. Implementation to have a thesaurus file, such as the code in the Thesaurus directory D:\test\dictory, the preservation of Chinese Word Library information construction Mmsegtokenizer object is to read Word library information

I can find the amount on the Internet.



Code:

1. Custom Word breaker Filter

package synonym;
Import java.io.IOException;
Import Java.util.Queue;

Import Java.util.concurrent.LinkedBlockingQueue;
Import Org.apache.lucene.analysis.TokenFilter;
Import Org.apache.lucene.analysis.TokenStream;
Import Org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
Import Org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;

Import Org.apache.lucene.util.AttributeSource; /** * Synonym Filter * * @version 2014-8-25 PM 01:56:08 */public class Mysynonymfilter extends Tokenfilter {private Ch
	Artermattribute CTA;

	Private Positionincrementattribute pia;

	Private Attributesource.state current;

	Save synonym information private queue<string> synonymqueue;

	Custom get synonym interface private synonymcontext ynonymcontext;

		Protected Mysynonymfilter (Tokenstream input, Synonymcontext ynonymcontext) {super (input);
		CTA = This.addattribute (Chartermattribute.class);
		Pia = This.addattribute (Positionincrementattribute.class); Synonymqueue = new Linkedblockingqueue<stRing> ();
	This.ynonymcontext = Ynonymcontext;
			
			@Override public boolean Incrementtoken () throws IOException {//is a synonym replacement if (synonymqueue.size () >0) {
			String sysnonymstr = Synonymqueue.poll ();
			Restore state restorestate (current) first;
			Set synonym Cta.setempty ();
			Cta.append (SYSNONYMSTR);
			
			Hanging in the same position pia.setpositionincrement (0);
		return true;
		} if (!this.input.incrementtoken ()) {return false;
			}else{if (addsynonym (cta.tostring ())) {//If a synonym exists, save the current state present = Capturestate ();
		return true;
		} Private Boolean addsynonym (String source) {string[] Allstrs = ynonymcontext.getsamewords (source);
			if (allstrs!= null && allstrs.length > 0) {for (string string:allstrs) {Synonymqueue.add (string);
		return true;
	return false;
 }

}


2. Custom synonym Breaker

package synonym;

Import Java.io.Reader;

Import Org.apache.lucene.analysis.Analyzer;
Import Org.apache.lucene.analysis.TokenStream;
Import Org.apache.lucene.analysis.Tokenizer;

Import com.chenlb.mmseg4j.Dictionary;
Import Com.chenlb.mmseg4j.MaxWordSeg;
Import Com.chenlb.mmseg4j.analysis.MMSegTokenizer;

/**
 * simple synonym breaker
 * @version 2014-8-25 Morning 10:46:21 * * * * */Public
class Mysynonymanalyzer extends Analyzer {

	//Thesaurus directory D:\test\dictory
	protected Dictionary dic;
	
	Public Mysynonymanalyzer (String path) {
		dic = dictionary.getinstance (path);
	}

	@Override
	protected tokenstreamcomponents createcomponents (String fieldName,
			Reader Reader) {
		// reader->tokenizer-> multiple Tokenfilter->tokenstream
		tokenizer Source = new Mmsegtokenizer (New MaxWordSeg ( DIC), reader);
		Tokenstream filter = new Mysynonymfilter (source,new simplesynonymcontext ());

		return new tokenstreamcomponents (Source,filter);

	}




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.