Lucene Tutorial (iv) Update and deletion of indexes

Source: Internet
Author: User

This article is written based on the previous article, using the Indexutil class, the following example does not post the entire class content, only the specific method content.

Version 3.5:

A check () method is first written to see the changes in the index file:

/** * Check the index file        */public static void check () {Indexreader indexreader = null;            try {Directory directory = Fsdirectory.open (new File ("F:/test/lucene/index"));            Indexreader = indexreader.open (directory);            Through reader you can effectively get to the number of documents//Valid index document SYSTEM.OUT.PRINTLN ("Valid index document:" + Indexreader.numdocs ());            Total Index Document SYSTEM.OUT.PRINTLN ("Total index Document:" + Indexreader.maxdoc ());        The deleted index document, in fact, is inappropriate, should be in the Recycle Bin index document SYSTEM.OUT.PRINTLN ("Deleted index Document:" + Indexreader.numdeleteddocs ());        } catch (Exception e) {e.printstacktrace ();                } finally {try {if (Indexreader! = null) {indexreader.close ();            }} catch (Exception e) {e.printstacktrace (); }        }    }

Then come down and run. The indexing method is followed by the following check () method to see the results:

Valid index documents: 3 Total Index document: 3 deleted index Document: 0
Next I want to delete an index, as in the following example:

    /** * Delete Index */public static void Delete () {IndexWriter indexwriter = null;            try {Directory directory = Fsdirectory.open (new File ("F:/test/lucene/index"));            Analyzer Analyzer = new StandardAnalyzer (version.lucene_35);            Indexwriterconfig indexwriterconfig = new Indexwriterconfig (version.lucene_35, analyzer);            IndexWriter = new IndexWriter (directory, indexwriterconfig); /** * parameter is an option that can be a query or a term,term is an exact lookup value * * The deleted document is not completely deleted, but is stored in a recycle Bin and can be             Restore */////mode one: Delete/** by term * Note the meaning of the term constructor, the first parameter is field, the second parameter is the value of field            */indexwriter.deletedocuments (New term ("id", "1")); Mode two: Delete/** by query * Here is to create a query out, delete the investigation index */queryparser Queryparser = n            EW Queryparser (version.lucene_35, "content", analyzer);   Create a query that indicates that the search domain is a content containing Lucene documents         Query query = queryparser.parse ("Lucene");        Indexwriter.deletedocuments (query);        } catch (Exception e) {e.printstacktrace ();                } finally {try {if (IndexWriter! = null) {indexwriter.close ();            }} catch (Exception e) {e.printstacktrace (); }        }    }
Look at the test:

    @Test public    void Testdelete () {        indexutil.delete ();        Indexutil.check ();    }
After execution:

Valid index documents: 2 Total Index document: 3 deleted index Document: 1
At this time the deleted document ran to the Recycle Bin, and has not been completely deleted, we use the method of deleting the term, then use the query to delete the line, then now put the note for a change:

            Indexwriter.deletedocuments (New term ("id", "1"));            Mode two: Delete/** by            query             * Here is to create a query out, delete the investigation index             *            /Queryparser queryparser = new Queryparser ( version.lucene_35, "content", analyzer);            Create a query to indicate that the search domain is a content containing Lucene document            query query = queryparser.parse ("Lucene");            Indexwriter.deletedocuments (query);
Run again. Test method:

Valid index documents: 1 Total Index document: 3 deleted index Document: 2
Look, the deleted document is one more, because we query the document and ID 1 document is not the same, now know how to delete the two ways to use it.
Now that I've deleted the error and want to get back to it, let's see how to recover the deleted index:

    /**     * Recover deleted index *    /public static void UnDelete () {        //Use Indexreader for recovery        indexreader Indexreader = null;        try {            Directory directory = fsdirectory.open (new File ("F:/test/lucene/index"));            When recovering, you must set the read-only (readOnly) of the Indexreader to false            //The index does not change can use true, but now is to restore the deleted index, obviously changed, so can only be false            Indexreader = Indexreader.open (directory, false);            Indexreader.undeleteall ();        } catch (Exception e) {            e.printstacktrace ();        } finally {            try {                if (indexreader! = null) {                    Indexreader.close ();                }            } catch (Exception e) {                e.printstacktrace ();}}    }
Run the test:

    @Test public    void Testundelete () {        indexutil.undelete ();        Indexutil.check ();    }
The result is:

Valid index documents: 3 Total Index document: 3 deleted index Document: 0
It's all restored, isn't it?

But now I have found that I have not deleted the error, I want to completely delete the index, how to do it, we go back to try again, I now. Delete the index of the two ways to open the comment, do the deletion method is not to get such a result AH:

Valid index documents: 1 Total Index document: 3 deleted index Document: 2
Then look at the completely deleted code:

    /**     * Force Delete     *    /public static void Forcedelete () {        IndexWriter indexwriter = null;        try {            Directory directory = fsdirectory.open (new File ("F:/test/lucene/index"));            Analyzer Analyzer = new StandardAnalyzer (version.lucene_35);            Indexwriterconfig indexwriterconfig = new Indexwriterconfig (version.lucene_35, analyzer);            IndexWriter = new IndexWriter (directory, indexwriterconfig);            Indexwriter.forcemergedeletes ();        } catch (Exception e) {            e.printstacktrace ();        } finally {            try {                if (indexwriter! = null) {                    Indexwriter.close ();                }            } catch (Exception e) {                e.printstacktrace ();}}    }
Execute the test code:

    @Test public    void Testforcedelete () {        indexutil.forcedelete ();        Indexutil.check ();    }
The results are as follows:

Valid index documents: 1 Total Index document: 1 deleted index Document: 0
At this point the two index documents were completely erased. So long are talking about delete, then how does lucene update the index, and note down to see how the index is updated:

Note: First delete the index file, re-build the index

    /** * UPDATE Index */public static void update () {IndexWriter indexwriter = null;            try {Directory directory = Fsdirectory.open (new File ("F:/test/lucene/index"));            Analyzer Analyzer = new StandardAnalyzer (version.lucene_35);            Indexwriterconfig indexwriterconfig = new Indexwriterconfig (version.lucene_35, analyzer);            IndexWriter = new IndexWriter (directory, indexwriterconfig);             /** * Lucene does not provide updates, here is the update operation is actually the following two operation of the collection is deleted before adding */Document document = new document ();            Document.add (New Field ("id", "one", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));            Document.add (New Field ("Author", Authors[0], Field.Store.YES, Field.Index.NOT_ANALYZED));            Document.add (New Field ("title", Titles[0], Field.Store.YES, Field.Index.ANALYZED));            Document.add (New Field ("content", contents[1], Field.Store.NO, Field.Index.ANALYZED)); Indexwriter.updatedoCument (New term ("id", "1"), document);        } catch (Exception e) {e.printstacktrace ();                } finally {try {if (IndexWriter! = null) {indexwriter.close ();            }} catch (Exception e) {e.printstacktrace (); }        }    }
Note that the above code, I use the content is ID 2 content, it contains "Lucene", I will use it to test, pay attention to the results

Now perform the update index:

    @Test public    void Testupdate () {        indexutil.update ();        Indexutil.check ();    }
The result is:

Valid index documents: 3 Total Index document: 4 deleted index Document: 1
The result is this, surprised, we come together to calculate, a valid document to delete a add a is not 3, right, the total number of documents is three plus one, the citation is deleted from the document also count Ah, not completely deleted, in the Recycle Bin, and then we execute the search () method to see the results:

    /** * * Search */public static void search () {Indexreader indexreader = null;            try {//1, create directory directory directory = Fsdirectory.open (new File ("F:/test/lucene/index"));            2, create Indexreader Indexreader = indexreader.open (directory);            3, according to Indexreader create Indexsearch indexsearcher indexsearcher = new Indexsearcher (Indexreader);            4. Create search query//Use the default standard word breaker analyzer Analyzer = new StandardAnalyzer (version.lucene_35); Search in content for Lucene//Create parser to determine the contents of the file to search for, the second parameter is the domain searched queryparser queryparser = new Queryparser (            version.lucene_35, "content", analyzer);            Create a query to indicate that the search domain is a content containing Lucene document query query = Queryparser.parse ("Lucene");            5. Search according to searcher and return topdocs topdocs topdocs = indexsearcher.search (query, 10); 6, according to Topdocs get Scoredoc object scoredoc[] Scoredocs = TOPDOCS.SCoredocs; for (Scoredoc Scoredoc:scoredocs) {///7, Get specific Document object based on searcher and Scoredoc object document Doc                Ument = Indexsearcher.doc (Scoredoc.doc);                8, according to the document object to obtain the required value SYSTEM.OUT.PRINTLN ("ID:" + document.get ("id"));                System.out.println ("Author:" + document.get ("author"));                System.out.println ("title:" + document.get ("title"));                 /** * See if the content can be printed out, why?            */System.out.println ("content:" + document.get ("content"));        }} catch (Exception e) {e.printstacktrace ();                } finally {try {if (Indexreader! = null) {indexreader.close ();            }} catch (Exception e) {e.printstacktrace (); }        }    }
    @Test public    void Testsearch () {        indexutil.search ();    }
Id:2author:tonytitle:hello Lucenecontent:nullid:11author:darrentitle:hello Worldcontent:null
Found two, indicating that the update was successful

I'll also update the index with ID 3:

Document document = new document ();            Document.add (New Field ("id", "one", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));            Document.add (New Field ("Author", Authors[0], Field.Store.YES, Field.Index.NOT_ANALYZED));            Document.add (New Field ("title", Titles[0], Field.Store.YES, Field.Index.ANALYZED));            Document.add (New Field ("content", contents[1], Field.Store.NO, Field.Index.ANALYZED));            Indexwriter.updatedocument (New term ("id", "3"), document);
Execute the update () method to see the results:

Valid index documents: 3 Total Index Document: 5 deleted index Document: 2
The problem comes, as the number of index file updates, the index file will be more and more ah, then we have a way to merge the optimization of it, the following see how Lucene is merging index files:

/** * Merge Index */public static void merge () {IndexWriter indexwriter = null;            try {Directory directory = Fsdirectory.open (new File ("F:/test/lucene/index"));            Analyzer Analyzer = new StandardAnalyzer (version.lucene_35);            Indexwriterconfig indexwriterconfig = new Indexwriterconfig (version.lucene_35, analyzer);            IndexWriter = new IndexWriter (directory, indexwriterconfig); The index is merged into 2 segments, and the deleted data in both paragraphs is emptied/** * Special NOTE: * Here Lucene is not recommended after 3.5 because it consumes a lot of open        , Lucene will automatically process the *///index into two segments Indexwriter.forcemerge (2) according to the situation;        } catch (Exception e) {e.printstacktrace ();                } finally {try {if (IndexWriter! = null) {indexwriter.close ();            }} catch (Exception e) {e.printstacktrace (); }        }    }
Perform the test:

    @Test public    void Testmerge () {        indexutil.merge ();        Indexutil.check ();    }
The result is:

Valid index documents: 3 Total Index document: 3 deleted index Document: 0
The number of index files is back to normal, here's the problem, Lucene's combined index method or optimized index method does not recommend human invocation, consumes a lot of resources, and Lucene automatically optimizes the index without worrying about how the index file has been greatly changed.

Version 4.5:

Version 5.0:

Versions 4.5 and 5.0 are updated later

























Lucene Tutorial (iv) Update and deletion of indexes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.