Lucene Combat (ii) Lucene index

Source: Internet
Author: User


Lucene is a tool that provides search, and does not implement content fetching. The acquisition of all content depends entirely on the implementation of its own application or of third-party tools. under Apache Lucene There is a subproject thatSOLR can implement to get raw data from a relational database. As long as you get the original text data,Lucene is responsible for establishing the relevant index.


Create an index

1.field.store.yes (NO) Storage domain Options

set to Y indicates that the contents of this domain are completely stored in a file for easy text restoration

Set to n means that the contents of this field are not stored in a file, but can be indexed, at which point the content cannot be completely restored

2.field.index (Index Options)


index.analyzed: do Word segmentation and indexing for headings, content, etc.

index.not_analyzed: Do the vegetarian citation, but do not make participle, if the social security number, name, LP and so on. Suitable for dry precise search

index.analyzed_not_norms : participles but not stored No RMS information. ,norms includes information such as the time and weight at which the index was created.

index.not_analyzed_not_norms : That is, no participle is not stored nor Ms information.

index.no: is not indexed.


/** * CREATE index-cfl-2015 May 26 */public void CreateIndex () {//writer object for writing to index file IndexWriter writer =null;try {// Load writer configuration writer=new IndexWriter (directory,new indexwriterconfig (version.lucene_35,new standardanalyzer ( version.lucene_35)));D ocument doc=null;//Traversal array write domain information for (int i=0;i<ids.length;i++) {doc=new Document ();d Oc.add (new Field ("id", ids[i],field.store.yes,field.index.not_analyzed));d Oc.add (new Field ("Name", Names[i],field.store.yes, Field.Index.NOT_ANALYZED));d Oc.add (New Field ("Mail", mails[i],field.store.yes,field.index.analyzed_no_norms)); Doc.add (New Field ("content", contents[i],field.store.no,field.index.analyzed_no_norms)); Writer.adddocument (DOC) ;//Get an array containing a specific character string strmail=mails[i].substring (Mails[i].lastindexof ("@")-1); System.out.println (Strmail);//Add weight to the specified field if (Scoremap.containskey (Strmail)) {Doc.setboost (Scoremap.get (i));} Else{doc.setboost (0.5f);}} System.out.println ("Index creation succeeded! ");} catch (Corruptindexexception e) {e.printstacktrace ();} catch (Lockobtainfailedexception e) {E.printstacktrace();} catch (IOException e) {e.printstacktrace ();} Finally{closewriter (writer);}}


When configuring Indexwriterconfig, it is important to note that the configuration specifies the version number. It also embodies the mature design concept of the Lucene developer. For users in their own choice of version of the given the full degree of freedom.


Creating an index is similar to creating a database record in a relational database, creating a doc equivalent to creating a list of records. field creation corresponds to the creation of fields.



Enquiry

1. General Enquiry

/** * Query the value of the index-cfl-2015 May 26 */public void Queryindex () {//1. Create Indexreaderindexreader reader=null;try {// The number of documents can be efficiently obtained through reader Reader=indexreader.open (directory); System.out.println ("Maxdocs:" +reader.maxdoc ()); System.out.println ("Numdocs:" +reader.numdocs ()); System.out.println ("Deletedocs:" +reader.numdeleteddocs ());} catch (Corruptindexexception e) {e.printstacktrace ();} catch (IOException e) {e.printstacktrace ();} Finally{closereader (reader);}}


Use Indexreader to get the total number of files in the index file directory, the number of index files, the number of deleted files information.



2. According to the keyword Term query


/** * Exact query index according to keyword-cfl-2015 May 27 */public void Queryindexbyterm () {indexreader Reader=null;try {reader= Indexreader.open (directory); Indexsearcher searcher=new Indexsearcher (reader); Termquery query=new Termquery (New term ("content", "bootst")); Topdocs topdocs=searcher.search (query, ten); for (Scoredoc Td:topDocs.scoreDocs) {System.out.println (td.tostring ()); System.out.println (Searcher.doc (Td.doc). Get ("name"));}} catch (Corruptindexexception e) {e.printstacktrace ();} catch (IOException e) {e.printstacktrace ();} Finally{closereader (reader);}}

The query exists in the content domain that contains the keyword BOOTST data.


Indexsearcher searcher=new indexsearcher (reader); Termquery query=new Termquery (New term ("content", "bootst")); Topdocs topdocs=searcher.search (query, 10);

Such a process is similar to the process of relational database queries


1. Using reader to build searcher objects

2. Configure Term Keywords

3. Query 10 data



Delete

1. Specify Delete


/** * Delete the specified index-cfl-2015 May 26 */public void Deleteindex () {indexwriter writer=null;try {writer=new indexwriter (directory, New Indexwriterconfig (Version.lucene_35,new StandardAnalyzer (version.lucene_35)));// Term exact lookup value writer.deletedocuments (New term ("id", "01")); SYSTEM.OUT.PRINTLN ("The specified index has been deleted! " );} catch (Corruptindexexception e) {e.printstacktrace ();} catch (Lockobtainfailedexception e) {e.printstacktrace ();} catch (IOException e) {e.printstacktrace ();} Finally{closewriter (writer);}}

Use or term keywords to delete the specified field data.


2. Delete all

/** * Delete all indexes-cfl-2015 May 26 */public void Deleteallindex () {indexwriter writer=null;try {writer=new IndexWriter ( Directory,new Indexwriterconfig (version.lucene_35,new standardanalyzer (version.lucene_35))); Writer.deleteAll ();} catch (Corruptindexexception e) {e.printstacktrace ();} catch (Lockobtainfailedexception e) {e.printstacktrace ();} catch (IOException e) {e.printstacktrace ();} Finally{closewriter (writer);}}

If you delete all of these, you can use the writer object directly. It is important to note that this is a similar operation in Windows, where resources are placed in the Recycle Bin and not completely removed.


3. Force delete (empty from Recycle Bin)

/** * Force Delete-cfl-2015 May 27 */public void Forcedelete () {IndexWriter writer = null;try {writer = new IndexWriter (directory, New Indexwriterconfig (version.lucene_35, New StandardAnalyzer (version.lucene_35)); Writer.forcemergedeletes ();} catch (Corruptindexexception e) {e.printstacktrace ();} catch (Lockobtainfailedexception e) {e.printstacktrace ();} catch (IOException e) {e.printstacktrace ();} finally {closewriter (writer);}}

Forcing the removal here will sacrifice some of the machine's performance and is not recommended for use. Lucene is automatically organized according to the situation after the index is increased. This will not be restored after deletion.



Recovery

Recovering an index file from the Recycle Bin

/** * Restore index-cfl-2015 May 26 */public void Undeleteindex () {Indexreader Reader=null;try {//Get specified directory folder, set read-only as falsereader= Indexreader.open (Directory,false); Reader.undeleteall (); System.out.println ("All deleted indexes have been recovered!") ");} catch (Corruptindexexception e) {e.printstacktrace ();} catch (Lockobtainfailedexception e) {e.printstacktrace ();} catch (IOException e) {e.printstacktrace ();} Finally{closereader (reader);}}


Directly using the method of Reader Undeleteall () can.



Set Weights

increase the frequency of the search, like Baidu did SEO .


Add Weight if (Scoremap.containskey (Strmail)) {Doc.setboost (Scoremap.get (i)) for the specified field;} Else{doc.setboost (0.5f);}




Lucene Combat (ii) Lucene index

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.