The steps to build Lucene are not detailed here, nothing more than download the relevant jar package, new Java project in Eclipse, the introduction of the relevant jar package can be
This article mainly in no analysis of the source of Lucene before the actual combat, through actual combat to promote research
Build an index
The following program shows the use of indexer
PackageCom.wuyudong.mylucene;ImportOrg.apache.lucene.index.IndexWriter;ImportOrg.apache.lucene.analysis.standard.StandardAnalyzer;Importorg.apache.lucene.document.Document;ImportOrg.apache.lucene.document.Field;Importorg.apache.lucene.store.FSDirectory;Importorg.apache.lucene.store.Directory;Importorg.apache.lucene.util.Version;ImportJava.io.File;ImportJava.io.FileFilter;Importjava.io.IOException;ImportJava.io.FileReader; Public classIndexertest { Public Static voidMain (string[] args)throwsException {if(Args.length! = 2) { Throw NewIllegalArgumentException ("Usage:java" + indexertest.class. GetName ()+ "<index dir> <data dir>"); } String Indexdir= Args[0];//1 Specifying a directory to create an indexString datadir = args[1];//2 Index The *.txt file in the specified directory LongStart =System.currenttimemillis (); Indexertest Indexer=Newindexertest (Indexdir); intnumindexed; Try{numindexed= Indexer.index (DataDir,NewTextfilesfilter ()); } finally{indexer.close (); } LongEnd =System.currenttimemillis (); System.out.println ("Indexing" + Numindexed + "files took" + (End-start) + "milliseconds"); } PrivateIndexWriter writer; PublicIndexertest (String Indexdir)throwsIOException {Directory dir= Fsdirectory.open (NewFile (Indexdir)); Writer=NewIndexWriter (dir,//3 Creating IndexWriter NewStandardAnalyzer (//3VERSION.LUCENE_30),//3 true,//3IndexWriter.MaxFieldLength.UNLIMITED);//3 } Public voidClose ()throwsIOException {writer.close (); //4 Close IndexWriter } Public intIndex (String datadir, filefilter filter)throwsException {file[] files=NewFile (DataDir). Listfiles (); for(File f:files) {if(!f.isdirectory () &&!f.ishidden () &&f.exists ()&&F.canread ()&&(Filter==NULL||filter.accept (f))) {indexfile (f); } } returnWriter.numdocs ();//5 Returns the number of documents indexed } Private Static classTextfilesfilterImplementsFileFilter { Public BooleanAccept (File path) {returnPath.getname (). toLowerCase ()//6 index *.txt files, using FileFilter. EndsWith (". txt");//6 } } protectedDocument GetDocument (File f)throwsException {Document doc=NewDocument (); Doc.add (NewField ("Contents",NewFileReader (f)));//7 Index file contentsDoc.add (NewField ("filename", F.getname (),//8 Index file nameField.Store.YES, Field.Index.NOT_ANALYZED));//8Doc.add (NewField ("FullPath", F.getcanonicalpath (),//9 Index file full pathField.Store.YES, Field.Index.NOT_ANALYZED));//9 returnDoc; } Private voidIndexfile (File f)throwsException {System.out.println ("Indexing" +F.getcanonicalpath ()); Document Doc=getdocument (f); Writer.adddocument (DOC); //10 Adding a document to the Lucene index }}
Configure the parameters in Eclipse:
E:\luceneinaction\index E:\luceneinaction\lia2e\src\lia\meetlucene\data
The results of the operation are as follows:
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\apache1.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\apache1.1.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\apache2.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\cpl1.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\epl1.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\freebsd.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\gpl1.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\gpl2.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\gpl3.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\lgpl2.1.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\lgpl3.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\lpgl2.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\mit.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\mozilla1.1.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\mozilla_eula_firefox3.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\mozilla_eula_thunderbird2.txt
indexing files took 888 milliseconds
An index file is generated within the index file:
Because the indexed files are small, the number is not (such as), but it will cost 888ms, or it is very disturbing.
In general, the Search index is more important than indexing, because the search is many times, and the index is just a once
Search Index
Next you will create a program to search for the index created above:
Importorg.apache.lucene.document.Document;ImportOrg.apache.lucene.search.IndexSearcher;ImportOrg.apache.lucene.search.Query;ImportOrg.apache.lucene.search.ScoreDoc;ImportOrg.apache.lucene.search.TopDocs;Importorg.apache.lucene.store.FSDirectory;Importorg.apache.lucene.store.Directory;ImportOrg.apache.lucene.queryParser.QueryParser;Importorg.apache.lucene.queryParser.ParseException;ImportOrg.apache.lucene.analysis.standard.StandardAnalyzer;Importorg.apache.lucene.util.Version;ImportJava.io.File;Importjava.io.IOException; Public classSearchertest { Public Static voidMain (string[] args)throwsillegalargumentexception, IOException, parseexception {if(Args.length! = 2) { Throw NewIllegalArgumentException ("Usage:java" + searchertest.class. GetName ()+ "<index dir> <query>"); } String Indexdir= Args[0];//1 parsing the input index pathString q = args[1];//2 Parsing the input query stringSearch (Indexdir, q); } Public Static voidSearch (String indexdir, String q)throwsIOException, parseexception {Directory dir= Fsdirectory.open (NewFile (Indexdir));//3 Open Index fileIndexsearcher is =NewIndexsearcher (dir);//3Queryparser Parser=NewQueryparser (version.lucene_30,//4 Parsing the query string"Contents",//4 NewStandardAnalyzer (//4VERSION.LUCENE_30));//4Query query = parser.parse (q);//4 LongStart =System.currenttimemillis (); Topdocs hits= Is.search (query, 10);//5 Search Index LongEnd =System.currenttimemillis (); System.err.println ("Found" + hits.totalhits +//6 record index status"Document (s) (in" + (End-start) +//6"milliseconds) that matched query '" +//6Q + "':");//6 for(Scoredoc scoreDoc:hits.scoreDocs) {Document doc= Is.doc (Scoredoc.doc);//7 return matching textSystem.out.println (Doc.get ("FullPath"));//8 displaying matching file names} is.close (); //9 Close Indexsearcher }}
Set parameters: E:\luceneinaction\index Patent
The results of the operation are as follows:
Found 8 Document (s) (in milliseconds) that matched query ' patent ':
E:\luceneinaction\lia2e\src\lia\meetlucene\data\cpl1.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\mozilla1.1.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\epl1.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\gpl3.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\apache2.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\gpl2.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\lpgl2.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\lgpl2.1.txt
You can see that the speed is fast (12ms), the absolute path of the file is printed, because indexer stores the absolute path to the file.
Lucene Actual Build Index