Lucene Combat (ii) Lucene index

Last Update:2015-05-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Lucene is a tool that provides search, and does not implement content fetching. The acquisition of all content depends entirely on the implementation of its own application or of third-party tools. under Apache Lucene There is a subproject thatSOLR can implement to get raw data from a relational database. As long as you get the original text data,Lucene is responsible for establishing the relevant index.

Create an index

1.field.store.yes (NO) Storage domain Options

set to Y indicates that the contents of this domain are completely stored in a file for easy text restoration

Set to n means that the contents of this field are not stored in a file, but can be indexed, at which point the content cannot be completely restored

2.field.index (Index Options)

index.analyzed: do Word segmentation and indexing for headings, content, etc.

index.not_analyzed: Do the vegetarian citation, but do not make participle, if the social security number, name, LP and so on. Suitable for dry precise search

index.analyzed_not_norms : participles but not stored No RMS information. ,norms includes information such as the time and weight at which the index was created.

index.not_analyzed_not_norms : That is, no participle is not stored nor Ms information.

index.no: is not indexed.

/** * CREATE index-cfl-2015 May 26 */public void CreateIndex () {//writer object for writing to index file IndexWriter writer =null;try {// Load writer configuration writer=new IndexWriter (directory,new indexwriterconfig (version.lucene_35,new standardanalyzer ( version.lucene_35)));D ocument doc=null;//Traversal array write domain information for (int i=0;i<ids.length;i++) {doc=new Document ();d Oc.add (new Field ("id", ids[i],field.store.yes,field.index.not_analyzed));d Oc.add (new Field ("Name", Names[i],field.store.yes, Field.Index.NOT_ANALYZED));d Oc.add (New Field ("Mail", mails[i],field.store.yes,field.index.analyzed_no_norms)); Doc.add (New Field ("content", contents[i],field.store.no,field.index.analyzed_no_norms)); Writer.adddocument (DOC) ;//Get an array containing a specific character string strmail=mails[i].substring (Mails[i].lastindexof ("@")-1); System.out.println (Strmail);//Add weight to the specified field if (Scoremap.containskey (Strmail)) {Doc.setboost (Scoremap.get (i));} Else{doc.setboost (0.5f);}} System.out.println ("Index creation succeeded! ");} catch (Corruptindexexception e) {e.printstacktrace ();} catch (Lockobtainfailedexception e) {E.printstacktrace();} catch (IOException e) {e.printstacktrace ();} Finally{closewriter (writer);}}

When configuring Indexwriterconfig, it is important to note that the configuration specifies the version number. It also embodies the mature design concept of the Lucene developer. For users in their own choice of version of the given the full degree of freedom.

Creating an index is similar to creating a database record in a relational database, creating a doc equivalent to creating a list of records. field creation corresponds to the creation of fields.

Enquiry

1. General Enquiry

/** * Query the value of the index-cfl-2015 May 26 */public void Queryindex () {//1. Create Indexreaderindexreader reader=null;try {// The number of documents can be efficiently obtained through reader Reader=indexreader.open (directory); System.out.println ("Maxdocs:" +reader.maxdoc ()); System.out.println ("Numdocs:" +reader.numdocs ()); System.out.println ("Deletedocs:" +reader.numdeleteddocs ());} catch (Corruptindexexception e) {e.printstacktrace ();} catch (IOException e) {e.printstacktrace ();} Finally{closereader (reader);}}

Use Indexreader to get the total number of files in the index file directory, the number of index files, the number of deleted files information.

2. According to the keyword Term query

/** * Exact query index according to keyword-cfl-2015 May 27 */public void Queryindexbyterm () {indexreader Reader=null;try {reader= Indexreader.open (directory); Indexsearcher searcher=new Indexsearcher (reader); Termquery query=new Termquery (New term ("content", "bootst")); Topdocs topdocs=searcher.search (query, ten); for (Scoredoc Td:topDocs.scoreDocs) {System.out.println (td.tostring ()); System.out.println (Searcher.doc (Td.doc). Get ("name"));}} catch (Corruptindexexception e) {e.printstacktrace ();} catch (IOException e) {e.printstacktrace ();} Finally{closereader (reader);}}

The query exists in the content domain that contains the keyword BOOTST data.

Indexsearcher searcher=new indexsearcher (reader); Termquery query=new Termquery (New term ("content", "bootst")); Topdocs topdocs=searcher.search (query, 10);

Such a process is similar to the process of relational database queries

1. Using reader to build searcher objects

2. Configure Term Keywords

3. Query 10 data

Delete

1. Specify Delete

/** * Delete the specified index-cfl-2015 May 26 */public void Deleteindex () {indexwriter writer=null;try {writer=new indexwriter (directory, New Indexwriterconfig (Version.lucene_35,new StandardAnalyzer (version.lucene_35)));// Term exact lookup value writer.deletedocuments (New term ("id", "01")); SYSTEM.OUT.PRINTLN ("The specified index has been deleted! " );} catch (Corruptindexexception e) {e.printstacktrace ();} catch (Lockobtainfailedexception e) {e.printstacktrace ();} catch (IOException e) {e.printstacktrace ();} Finally{closewriter (writer);}}

Use or term keywords to delete the specified field data.

2. Delete all

/** * Delete all indexes-cfl-2015 May 26 */public void Deleteallindex () {indexwriter writer=null;try {writer=new IndexWriter ( Directory,new Indexwriterconfig (version.lucene_35,new standardanalyzer (version.lucene_35))); Writer.deleteAll ();} catch (Corruptindexexception e) {e.printstacktrace ();} catch (Lockobtainfailedexception e) {e.printstacktrace ();} catch (IOException e) {e.printstacktrace ();} Finally{closewriter (writer);}}

If you delete all of these, you can use the writer object directly. It is important to note that this is a similar operation in Windows, where resources are placed in the Recycle Bin and not completely removed.

3. Force delete (empty from Recycle Bin)

/** * Force Delete-cfl-2015 May 27 */public void Forcedelete () {IndexWriter writer = null;try {writer = new IndexWriter (directory, New Indexwriterconfig (version.lucene_35, New StandardAnalyzer (version.lucene_35)); Writer.forcemergedeletes ();} catch (Corruptindexexception e) {e.printstacktrace ();} catch (Lockobtainfailedexception e) {e.printstacktrace ();} catch (IOException e) {e.printstacktrace ();} finally {closewriter (writer);}}

Forcing the removal here will sacrifice some of the machine's performance and is not recommended for use. Lucene is automatically organized according to the situation after the index is increased. This will not be restored after deletion.

Recovery

Recovering an index file from the Recycle Bin

/** * Restore index-cfl-2015 May 26 */public void Undeleteindex () {Indexreader Reader=null;try {//Get specified directory folder, set read-only as falsereader= Indexreader.open (Directory,false); Reader.undeleteall (); System.out.println ("All deleted indexes have been recovered!") ");} catch (Corruptindexexception e) {e.printstacktrace ();} catch (Lockobtainfailedexception e) {e.printstacktrace ();} catch (IOException e) {e.printstacktrace ();} Finally{closereader (reader);}}

Directly using the method of Reader Undeleteall () can.

Set Weights

increase the frequency of the search, like Baidu did SEO .

Add Weight if (Scoremap.containskey (Strmail)) {Doc.setboost (Scoremap.get (i)) for the specified field;} Else{doc.setboost (0.5f);}

Lucene Combat (ii) Lucene index

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Lucene Combat (ii) Lucene index

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Lucene Combat (ii) Lucene index

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support