Lucene Actual Build Index

Source: Internet
Author: User

The steps to build Lucene are not detailed here, nothing more than download the relevant jar package, new Java project in Eclipse, the introduction of the relevant jar package can be

This article mainly in no analysis of the source of Lucene before the actual combat, through actual combat to promote research

Build an index

The following program shows the use of indexer

 PackageCom.wuyudong.mylucene;ImportOrg.apache.lucene.index.IndexWriter;ImportOrg.apache.lucene.analysis.standard.StandardAnalyzer;Importorg.apache.lucene.document.Document;ImportOrg.apache.lucene.document.Field;Importorg.apache.lucene.store.FSDirectory;Importorg.apache.lucene.store.Directory;Importorg.apache.lucene.util.Version;ImportJava.io.File;ImportJava.io.FileFilter;Importjava.io.IOException;ImportJava.io.FileReader; Public classIndexertest { Public Static voidMain (string[] args)throwsException {if(Args.length! = 2) {      Throw NewIllegalArgumentException ("Usage:java" + indexertest.class. GetName ()+ "<index dir> <data dir>"); } String Indexdir= Args[0];//1 Specifying a directory to create an indexString datadir = args[1];//2 Index The *.txt file in the specified directory    LongStart =System.currenttimemillis (); Indexertest Indexer=Newindexertest (Indexdir); intnumindexed; Try{numindexed= Indexer.index (DataDir,NewTextfilesfilter ()); } finally{indexer.close (); }    LongEnd =System.currenttimemillis (); System.out.println ("Indexing" + Numindexed + "files took" + (End-start) + "milliseconds"); }  PrivateIndexWriter writer;  PublicIndexertest (String Indexdir)throwsIOException {Directory dir= Fsdirectory.open (NewFile (Indexdir)); Writer=NewIndexWriter (dir,//3 Creating IndexWriter                 NewStandardAnalyzer (//3VERSION.LUCENE_30),//3                 true,//3IndexWriter.MaxFieldLength.UNLIMITED);//3  }   Public voidClose ()throwsIOException {writer.close (); //4 Close IndexWriter  }   Public intIndex (String datadir, filefilter filter)throwsException {file[] files=NewFile (DataDir). Listfiles ();  for(File f:files) {if(!f.isdirectory () &&!f.ishidden () &&f.exists ()&&F.canread ()&&(Filter==NULL||filter.accept (f)))      {indexfile (f); }    }    returnWriter.numdocs ();//5 Returns the number of documents indexed  }  Private Static classTextfilesfilterImplementsFileFilter { Public BooleanAccept (File path) {returnPath.getname (). toLowerCase ()//6 index *.txt files, using FileFilter. EndsWith (". txt");//6    }  }  protectedDocument GetDocument (File f)throwsException {Document doc=NewDocument (); Doc.add (NewField ("Contents",NewFileReader (f)));//7 Index file contentsDoc.add (NewField ("filename", F.getname (),//8 Index file nameField.Store.YES, Field.Index.NOT_ANALYZED));//8Doc.add (NewField ("FullPath", F.getcanonicalpath (),//9 Index file full pathField.Store.YES, Field.Index.NOT_ANALYZED));//9    returnDoc; }  Private voidIndexfile (File f)throwsException {System.out.println ("Indexing" +F.getcanonicalpath ()); Document Doc=getdocument (f);                              Writer.adddocument (DOC); //10 Adding a document to the Lucene index  }}

Configure the parameters in Eclipse:

E:\luceneinaction\index E:\luceneinaction\lia2e\src\lia\meetlucene\data

The results of the operation are as follows:

Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\apache1.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\apache1.1.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\apache2.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\cpl1.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\epl1.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\freebsd.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\gpl1.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\gpl2.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\gpl3.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\lgpl2.1.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\lgpl3.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\lpgl2.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\mit.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\mozilla1.1.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\mozilla_eula_firefox3.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\mozilla_eula_thunderbird2.txt
indexing files took 888 milliseconds

An index file is generated within the index file:

Because the indexed files are small, the number is not (such as), but it will cost 888ms, or it is very disturbing.

In general, the Search index is more important than indexing, because the search is many times, and the index is just a once

Search Index

Next you will create a program to search for the index created above:

Importorg.apache.lucene.document.Document;ImportOrg.apache.lucene.search.IndexSearcher;ImportOrg.apache.lucene.search.Query;ImportOrg.apache.lucene.search.ScoreDoc;ImportOrg.apache.lucene.search.TopDocs;Importorg.apache.lucene.store.FSDirectory;Importorg.apache.lucene.store.Directory;ImportOrg.apache.lucene.queryParser.QueryParser;Importorg.apache.lucene.queryParser.ParseException;ImportOrg.apache.lucene.analysis.standard.StandardAnalyzer;Importorg.apache.lucene.util.Version;ImportJava.io.File;Importjava.io.IOException; Public classSearchertest { Public Static voidMain (string[] args)throwsillegalargumentexception, IOException, parseexception {if(Args.length! = 2) {      Throw NewIllegalArgumentException ("Usage:java" + searchertest.class. GetName ()+ "<index dir> <query>"); } String Indexdir= Args[0];//1 parsing the input index pathString q = args[1];//2 Parsing the input query stringSearch (Indexdir, q); }   Public Static voidSearch (String indexdir, String q)throwsIOException, parseexception {Directory dir= Fsdirectory.open (NewFile (Indexdir));//3 Open Index fileIndexsearcher is =NewIndexsearcher (dir);//3Queryparser Parser=NewQueryparser (version.lucene_30,//4 Parsing the query string"Contents",//4                     NewStandardAnalyzer (//4VERSION.LUCENE_30));//4Query query = parser.parse (q);//4    LongStart =System.currenttimemillis (); Topdocs hits= Is.search (query, 10);//5 Search Index    LongEnd =System.currenttimemillis (); System.err.println ("Found" + hits.totalhits +//6 record index status"Document (s) (in" + (End-start) +//6"milliseconds) that matched query '" +//6Q + "':");//6     for(Scoredoc scoreDoc:hits.scoreDocs) {Document doc= Is.doc (Scoredoc.doc);//7 return matching textSystem.out.println (Doc.get ("FullPath"));//8 displaying matching file names} is.close (); //9 Close Indexsearcher  }}

Set parameters: E:\luceneinaction\index Patent

The results of the operation are as follows:

Found 8 Document (s) (in milliseconds) that matched query ' patent ':
E:\luceneinaction\lia2e\src\lia\meetlucene\data\cpl1.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\mozilla1.1.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\epl1.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\gpl3.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\apache2.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\gpl2.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\lpgl2.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\lgpl2.1.txt

You can see that the speed is fast (12ms), the absolute path of the file is printed, because indexer stores the absolute path to the file.

Lucene Actual Build Index

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.