04_java Lucene Learning--participle Analyzer: lucene4.0_ Learn to write a simple Chinese synonym breaker _

04_java Lucene Learning--participle Analyzer: lucene4.0_ Learn to write a simple Chinese synonym breaker __lucene learning

Last Update:2018-07-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Spit Groove:

1. This week is busy like a dog, forget to learn, is still in copy other people's code, do not know why the awkward point ...

Description

1. Segmentation of the data flow:reader->tokenizer-> multiple Tokenfilter filter->tokenstream

2. The use of Chinese synonyms, the need for Mmseg4jjar package support, the main use of people's Word breaker (Mmsegtokenizer Class), this I still directly with ready-made.

And then customizing the Tokenizeffilter, and finally customizing a Chinese word breaker mysynonymanalyzer

3. Implementation to have a thesaurus file, such as the code in the Thesaurus directory D:\test\dictory, the preservation of Chinese Word Library information construction Mmsegtokenizer object is to read Word library information

I can find the amount on the Internet.

Code:

1. Custom Word breaker Filter

package synonym;
Import java.io.IOException;
Import Java.util.Queue;

Import Java.util.concurrent.LinkedBlockingQueue;
Import Org.apache.lucene.analysis.TokenFilter;
Import Org.apache.lucene.analysis.TokenStream;
Import Org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
Import Org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;

Import Org.apache.lucene.util.AttributeSource; /** * Synonym Filter * * @version 2014-8-25 PM 01:56:08 */public class Mysynonymfilter extends Tokenfilter {private Ch
	Artermattribute CTA;

	Private Positionincrementattribute pia;

	Private Attributesource.state current;

	Save synonym information private queue<string> synonymqueue;

	Custom get synonym interface private synonymcontext ynonymcontext;

		Protected Mysynonymfilter (Tokenstream input, Synonymcontext ynonymcontext) {super (input);
		CTA = This.addattribute (Chartermattribute.class);
		Pia = This.addattribute (Positionincrementattribute.class); Synonymqueue = new Linkedblockingqueue<stRing> ();
	This.ynonymcontext = Ynonymcontext;
			
			@Override public boolean Incrementtoken () throws IOException {//is a synonym replacement if (synonymqueue.size () >0) {
			String sysnonymstr = Synonymqueue.poll ();
			Restore state restorestate (current) first;
			Set synonym Cta.setempty ();
			Cta.append (SYSNONYMSTR);
			
			Hanging in the same position pia.setpositionincrement (0);
		return true;
		} if (!this.input.incrementtoken ()) {return false;
			}else{if (addsynonym (cta.tostring ())) {//If a synonym exists, save the current state present = Capturestate ();
		return true;
		} Private Boolean addsynonym (String source) {string[] Allstrs = ynonymcontext.getsamewords (source);
			if (allstrs!= null && allstrs.length > 0) {for (string string:allstrs) {Synonymqueue.add (string);
		return true;
	return false;
 }

}

2. Custom synonym Breaker

package synonym;

Import Java.io.Reader;

Import Org.apache.lucene.analysis.Analyzer;
Import Org.apache.lucene.analysis.TokenStream;
Import Org.apache.lucene.analysis.Tokenizer;

Import com.chenlb.mmseg4j.Dictionary;
Import Com.chenlb.mmseg4j.MaxWordSeg;
Import Com.chenlb.mmseg4j.analysis.MMSegTokenizer;

/**
 * simple synonym breaker
 * @version 2014-8-25 Morning 10:46:21 * * * * */Public
class Mysynonymanalyzer extends Analyzer {

	//Thesaurus directory D:\test\dictory
	protected Dictionary dic;
	
	Public Mysynonymanalyzer (String path) {
		dic = dictionary.getinstance (path);
	}

	@Override
	protected tokenstreamcomponents createcomponents (String fieldName,
			Reader Reader) {
		// reader->tokenizer-> multiple Tokenfilter->tokenstream
		tokenizer Source = new Mmsegtokenizer (New MaxWordSeg ( DIC), reader);
		Tokenstream filter = new Mysynonymfilter (source,new simplesynonymcontext ());

		return new tokenstreamcomponents (Source,filter);

	}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

04_java Lucene Learning--participle Analyzer: lucene4.0_ Learn to write a simple Chinese synonym breaker __lucene learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

04_java Lucene Learning--participle Analyzer: lucene4.0_ Learn to write a simple Chinese synonym breaker __lucene learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support