Optimization of Lucene in full-text search--Create INDEX Library

Source: Internet
Author: User
Tags create directory throw exception

on the basis of the previous HelloWorld , build a directory package and add a the directorytest test class that is used to create directory storage guidelines based on the specified index directory .


directorytest the code in the class is as follows , basically is in HelloWorld on the basis of the change can be .

there are altogether three methods of,testdirectory (),test Creating an index library;Testdirectoryfsandram(),Combination Method1two ways to create,Optimized;testdirectoryoptimize (),in the method2on the basis of,research on optimization of index creation,reduce the number of indexed files created.

Package Com.lucene.directory;import Org.apache.lucene.analysis.analyzer;import Org.apache.lucene.analysis.standard.standardanalyzer;import Org.apache.lucene.document.document;import Org.apache.lucene.index.indexwriter;import Org.apache.lucene.index.indexwriter.maxfieldlength;import Org.apache.lucene.store.directory;import Org.apache.lucene.store.fsdirectory;import Org.apache.lucene.store.ramdirectory;import Org.junit.test;import com.lucene.units.file2documentutils;/** * Create an index library * @author Liu * */public class Directorytest {//the path of the file that needs to be queried string filePath = "f:\\users\\liuyanling\\workspace\\ Lucenedemo\\datasource\\peoplewhocannot.txt ";//Set the path where the index is stored string indexpath =" f:\\users\\liuyanling\\workspace\\ Lucenedemo\\luceneindex ";//Set Word breaker as standard word breaker analyzer Analyzer = new StandardAnalyzer ();/** * Test, automatically build index library * @throws Exception thrown Exception */@Testpublic void testdirectory () throws Exception {//index is generated into the file system (Pros: data is persisted; disadvantage: slow operation) Directory dir = Fsdirectory.getdirectory (Indexpath);/*//to build the index into memory (pros: Fast, Cons: Program off, data clearexcept) Directory dir = new Ramdirectory (); *///so can be combined: 1. When generated, generated into memory; 2. When closed, save it. (see method Testdirectoryfsandram below) Document doc = file2documentutils.file2document (filePath); IndexWriter indexwriter = new IndexWriter (dir, analyzer, True , maxfieldlength.limited); Indexwriter.adddocument (doc); Indexwriter.close ();} /** * Test, startup read in memory, save on Exit * @throws Exception Throw exception */@Testpublic void Testdirectoryfsandram () throws Exception {//CREATE index Library of file system Directory Fsdir = fsdirectory.getdirectory (Indexpath); 1. Start reading//Construction Memory Index Library Directory Ramdir = new Ramdirectory (fsdir);//Operation Ramdir when running program, construct indexer IndexWriter of Operation Memory Index Library Ramindexwriter = new IndexWriter (ramdir,analyzer,maxfieldlength.limited);//Add document documents, index the document to the Memory index library Document DOC = File2documentutils.file2document (FilePath); ramindexwriter.adddocument (doc); Ramindexwriter.close ();//2. Save on Exit// Constructs the file system indexer, which is true to override the specified index file under the index directory, and False to continue appending the new index file to the index file that is already present under the specified index directory. IndexWriter fsindexwriter = new IndexWriter (fsdir,analyzer,true,maxfieldlength.limited);//Add indexes that are not optimized to add an in-memory index library to the textFsindexwriter.addindexesnooptimize (new directory[] {Ramdir}) in the system;//fsindexwriter.flush ();// Fsindexwriter.optimize (); Fsindexwriter.close ();} /** * index generation, optimization, merging index files, reducing number of build index files * @throws Exception */@Testpublic void Testdirectoryoptimize () throws Exception {//Create file system Index Library Directory Fsdir = fsdirectory.getdirectory (Indexpath); Construct indexer IndexWriter fsindexwriter = new IndexWriter (fsdir,analyzer,maxfieldlength.limited);//Use indexer, Optimized build index Fsindexwriter.optimize (); Fsindexwriter.close ();}}

View Run Effects ,

     1. First Test testdirectory () Span lang= "ZH-CN" style= "FONT-SIZE:18PX; Font-family:calibri ">directory dir =fsdirectory.getdirectory (indexpath);   , First delete the existing index file . 


After the test is successful


Refresh the project and you'll see the new index .


2. re-test the Directory dir = new Ramdirectory () of testdirectory () ; , or delete the index file, perform unit tests, succeed, but do not generate a visible index library in the file system. However, you can compare the time between the two methods of execution, and it is obvious that the memory is created fairly quickly.


        3. (Pros: persistent data retention; disadvantage: slow operation) (Pros: Fast, Cons: Program close, data cleanup) . Span lang= "ZH-CN" style= "Font-family:simsun; Font-size:18px "", so the combination of the two Testdirectoryfsandram method, first look at the effect, because there is no index file, You do not have to delete.

run successfully, time is longer than using only the memory index library and the file system index library, but this is just a Document file created as index , If there are multiple , then the method is more efficient.


and you can see that the index library is also generated.  


4. now to perform an optimized creation of index testdirectoryoptimize, create several more indexes first. 4 indexes are now created with testdirectoryfsandram .

Execution HelloWorld of the Search method, you can see that it is 4 article.


Then execute the Optimization method


The index file is from the original4Strip8kthe records,into1Strip24kthe records.look at the results.,one is the volume reduction; 2is that the file becomes less.


5.now I know.,Create the memory indexer to manipulate the document first,Save at end,can play an optimal role in many files,andoptimizemethod can reduce the index of the file,and Volume,two combinations,an optimized index can be generated.which,Fsindexwriter.flush ();is obsolete,I'm just saying that.,The operation was also successful,There's the effect ..

           above isLucenethe CREATE INDEX library.Specify index location,Create an index in memory first,and then save it to the file system.,can be optimized at the same time,Merging index files.the next one is for the word breaker, using a variety of word breakers, to see their segmentation effect, " full-text retrieval of the lucene optimization -- word breaker"

Optimization of Lucene in full-text search--Create INDEX Library

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.