"neo4j" pit-neo4j Chinese Index

Source: Internet
Author: User
Tags reset neo4j

The neo4j being used is the current latest version: 3.1.0, various tread pits. Say how to use the Chinese index in neo4j 3.1.0. Choose the Ikanalyzer to do the word breaker.


1. First refer to the article:

https://segmentfault.com/a/1190000005665612

The way of indexing with Ikanalyzer is roughly spoken. But it is not clear, in fact, the background of this article is to use embedded neo4j, that is, neo4j must be embedded in your Java application (https://neo4j.com/docs/java-reference/current/# tutorials-java-embedded), remember. Otherwise, you cannot use the custom Analyzer. Secondly, the method used in the text now has a problem, because neo4j 3.1.0 with lucene5.5, so the official Ikanalyzer has not been applied.


2. Correction

Switch to Ikanalyzer2012ff_u1.jar, which Google can download to (https://code.google.com/archive/p/ik-analyzer/downloads). This version of Ikanalyzer is a version that has been modified by a small partner to fix Ikanalyzer not fit lucene3.5. But the use of this package is still a problem, error message:

caused By:java.lang.AbstractMethodError:org.apache.lucene.analysis.Analyzer.createComponents (ljava/lang/string ;) lorg/apache/lucene/analysis/analyzer$tokenstreamcomponents;

That is, the Ikanalyzer Analyzer class and the current version of Lucene still do not fit.

Solution: Add two more classes

Package com.uc.wa.function;

Import Org.apache.lucene.analysis.Analyzer;
Import Org.apache.lucene.analysis.Tokenizer;

public class ikanalyzer5x extends analyzer{

	private Boolean usesmart;
	
	public Boolean Usesmart () {
		return usesmart;
	}

	public void Setusesmart (Boolean usesmart) {
		this.usesmart = Usesmart;
	}

	Public ikanalyzer5x () {This
		(false);
	}
	
	Public ikanalyzer5x (Boolean Usesmart) {
		super ();
		This.usesmart = Usesmart;
	}

	
	/**
	protected tokenstreamcomponents createcomponents (String fieldName, Final Reader in) {
		Tokenizer _ Iktokenizer = new Iktokenizer (in, This.usesmart ());
		return new tokenstreamcomponents (_iktokenizer);
	}
	**/
	
	
    /**
     * Rewrite the latest version of the Createcomponents
     * Overload Analyzer interface, construct the sub-phrase */
	@Override
	protected Tokenstreamcomponents createcomponents (String fieldName) {
		Tokenizer _iktokenizer = new iktokenizer5x ( This.usesmart ());
		return new tokenstreamcomponents (_iktokenizer);
	}
}

Package com.uc.wa.function;

Import java.io.IOException;
Import Org.apache.lucene.analysis.Tokenizer;
Import Org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
Import Org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
Import Org.apache.lucene.analysis.tokenattributes.TypeAttribute;
Import Org.wltea.analyzer.core.IKSegmenter;

Import Org.wltea.analyzer.core.Lexeme;
     
    public class iktokenizer5x extends tokenizer{//ikִʵprivate iksegmenter _ikimplement;
    Ԫıprivate final Chartermattribute Termatt;
    Ԫλprivate final Offsetattribute Offsetatt;
    Ԫԣէοorg.wltea.analyzer.core.lexemeеķೣprivate final Typeattribute Typeatt;
     
     
    ¼һԪľλprivate int endposition;
        /** Public Iktokenizer (Reader in, Boolean Usesmart) {super (in);
        Offsetatt = AddAttribute (Offsetattribute.class);
  Termatt = AddAttribute (Chartermattribute.class);      Typeatt = AddAttribute (Typeattribute.class);
    _ikimplement = new Iksegmenter (input, Usesmart);
    }**//** * Lucene 5.x tokenizer๹캯*ʵµtokenizerӿ* @param usesmart * *
        Public iktokenizer5x (Boolean Usesmart) {super ();
        Offsetatt = AddAttribute (Offsetattribute.class);
        Termatt = AddAttribute (Chartermattribute.class);
        Typeatt = AddAttribute (Typeattribute.class);
    _ikimplement = new Iksegmenter (input, Usesmart); }/* (non-javadoc) * @see org.apache.lucene.analysis.tokenstream#incrementtoken () */@Override Pub
        Lic Boolean Incrementtoken () throws IOException {//еĵԪclearattributes ();
        Lexeme nextlexeme = _ikimplement.next (); if (nextlexeme! = null) {//Lexemeתattributes//ôԪıtermatt.append (nextlexeme.
            Getlexemetext ()); ÔԪtermatt.setlength (NEXTlexeme.getlength ());
            ÔԪλoffsetatt.setoffset (Nextlexeme.getbeginposition (), nextlexeme.getendposition ());
            ¼ִʵλendposition = Nextlexeme.getendposition ();          
            ¼Ԫtypeatt.settype (Nextlexeme.getlexemetypestring ());
        True֪¸Ԫreturn true;
    }//False֪Ԫreturn false; }/* * (non-javadoc) * @see org.apache.lucene.analysis.tokenizer#reset (java.io.Reader) */@Ov
        Erride public void Reset () throws IOException {Super.reset ();
    _ikimplement.reset (input); } @Override public final void End () {//Set final offset int finaloffset = Correctoffset
        (this.endposition);
    Offsetatt.setoffset (Finaloffset, Finaloffset);
 }
}

Solve problems that Ikanalyzer2012ff_u1.jar and lucene5 do not fit. Replace Ikanalyzer with ikanalyzer5x when used.


3. Finally

NEO4J Chinese index Establishment and search example:

	/** * Create an index for a single node * * @param propkeys */public static void Createfulltextindex (long id, list<string> PROPK
		Eys) {log.info ("Method[createfulltextindex] begin.propkeys<" +propkeys+ ">");
		
		Index<node> entityindex = null;
					Try (Transaction tx = Neo4j.graphDb.beginTx ()) {Entityindex = Neo4j.graphDb.index (). Fornodes ("Nodefulltextindex",
			
			Maputil.stringmap (Indexmanager.provider, "Lucene", "Analyzer", IKAnalyzer5x.class.getName ()));
			Node node = Neo4j.graphDb.getNodeById (ID); Log.info ("Method[createfulltextindex" Get Node id< "+node.getid () +" > name< "+node.getproperty (" Knowledge_
			Name ") +" > "); /** Get node Details */set<map.entry<string, object>> properties = Node.getproperties (Propkeys.toarray (new
			String[0]). EntrySet (); For (map.entry<string, object> property:properties) {log.info ("Method[createfulltextindex] index prop<" +PR
				Operty.getkey () + ":" +property.getvalue () + ">"); Entityindex.add (nOde, Property.getkey (), Property.getvalue ());
		} tx.success (); }
	}

	/** * Use index query * * @param query * @return * @throws IOException */public static list<map<string, Obje Ct>> selectbyfulltextindex (string[] fields, String query) throws IOException {list<map<string, Object
		>> ret = lists.newarraylist ();
			Try (Transaction tx = Neo4j.graphDb.beginTx ()) {Indexmanager index = Neo4j.graphDb.index (); /** Query */index<node> Addressnodefulltextindex = Index.fornodes ("Nodefulltextindex", MapUtil.stringMap (IndexMa Nager.
			PROVIDER, "Lucene", "Analyzer", IKAnalyzer5x.class.getName ()));
			
			Query q = Ikqueryparser.parsemultifield (fields, query);

	        indexhits<node> foundnodes = addressnodefulltextindex.query (q);
	        	for (Node n:foundnodes) {map<string, object> m = n.getallproperties (); if (!
	        	Float.isnan (Foundnodes.currentscore ())) {M.put ("score", Foundnodes.currentscore ()); } log.info ("Method[selectbyindex] score<" +foundnodes.currentscore () + ">");
	        Ret.add (m);
		} tx.success (); } catch (IOException e) {log.error ("Method[selectbyindex] fields<" +joiner.on (","). Join (Fields) + "> query<" +
			Query+ ">", e);
		Throw e;
	} return ret; }

Notice here that I used Ikqueryparser, which automatically constructs query based on our query terms and the fields to be queried. Here is to bypass a pit: with Lucene query statement directly, there is a problem. For example: "Address: Nanchang" query statement, will search all with the city word address, this is very unreasonable. Use Ikqueryparser to fix the problem. Ikqueryparser is a tool that Ikanalyzer comes from, but is cut off in Ikanalyzer2012ff_u1.jar. So I re-introduced the original Ikanalyzer jar package, and the project was eventually co-existing with two jar packages.


The pit is almost on foot here.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.