Lucene Actual Build Index

Last Update:2016-04-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The steps to build Lucene are not detailed here, nothing more than download the relevant jar package, new Java project in Eclipse, the introduction of the relevant jar package can be

This article mainly in no analysis of the source of Lucene before the actual combat, through actual combat to promote research

Build an index

The following program shows the use of indexer

 PackageCom.wuyudong.mylucene;ImportOrg.apache.lucene.index.IndexWriter;ImportOrg.apache.lucene.analysis.standard.StandardAnalyzer;Importorg.apache.lucene.document.Document;ImportOrg.apache.lucene.document.Field;Importorg.apache.lucene.store.FSDirectory;Importorg.apache.lucene.store.Directory;Importorg.apache.lucene.util.Version;ImportJava.io.File;ImportJava.io.FileFilter;Importjava.io.IOException;ImportJava.io.FileReader; Public classIndexertest { Public Static voidMain (string[] args)throwsException {if(Args.length! = 2) {      Throw NewIllegalArgumentException ("Usage:java" + indexertest.class. GetName ()+ "<index dir> <data dir>"); } String Indexdir= Args[0];//1 Specifying a directory to create an indexString datadir = args[1];//2 Index The *.txt file in the specified directory    LongStart =System.currenttimemillis (); Indexertest Indexer=Newindexertest (Indexdir); intnumindexed; Try{numindexed= Indexer.index (DataDir,NewTextfilesfilter ()); } finally{indexer.close (); }    LongEnd =System.currenttimemillis (); System.out.println ("Indexing" + Numindexed + "files took" + (End-start) + "milliseconds"); }  PrivateIndexWriter writer;  PublicIndexertest (String Indexdir)throwsIOException {Directory dir= Fsdirectory.open (NewFile (Indexdir)); Writer=NewIndexWriter (dir,//3 Creating IndexWriter                 NewStandardAnalyzer (//3VERSION.LUCENE_30),//3                 true,//3IndexWriter.MaxFieldLength.UNLIMITED);//3  }   Public voidClose ()throwsIOException {writer.close (); //4 Close IndexWriter  }   Public intIndex (String datadir, filefilter filter)throwsException {file[] files=NewFile (DataDir). Listfiles ();  for(File f:files) {if(!f.isdirectory () &&!f.ishidden () &&f.exists ()&&F.canread ()&&(Filter==NULL||filter.accept (f)))      {indexfile (f); }    }    returnWriter.numdocs ();//5 Returns the number of documents indexed  }  Private Static classTextfilesfilterImplementsFileFilter { Public BooleanAccept (File path) {returnPath.getname (). toLowerCase ()//6 index *.txt files, using FileFilter. EndsWith (". txt");//6    }  }  protectedDocument GetDocument (File f)throwsException {Document doc=NewDocument (); Doc.add (NewField ("Contents",NewFileReader (f)));//7 Index file contentsDoc.add (NewField ("filename", F.getname (),//8 Index file nameField.Store.YES, Field.Index.NOT_ANALYZED));//8Doc.add (NewField ("FullPath", F.getcanonicalpath (),//9 Index file full pathField.Store.YES, Field.Index.NOT_ANALYZED));//9    returnDoc; }  Private voidIndexfile (File f)throwsException {System.out.println ("Indexing" +F.getcanonicalpath ()); Document Doc=getdocument (f);                              Writer.adddocument (DOC); //10 Adding a document to the Lucene index  }}

Configure the parameters in Eclipse:

E:\luceneinaction\index E:\luceneinaction\lia2e\src\lia\meetlucene\data

The results of the operation are as follows:

Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\apache1.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\apache1.1.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\apache2.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\cpl1.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\epl1.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\freebsd.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\gpl1.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\gpl2.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\gpl3.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\lgpl2.1.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\lgpl3.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\lpgl2.0.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\mit.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\mozilla1.1.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\mozilla_eula_firefox3.txt
Indexing E:\luceneinaction\lia2e\src\lia\meetlucene\data\mozilla_eula_thunderbird2.txt
indexing files took 888 milliseconds

An index file is generated within the index file:

Because the indexed files are small, the number is not (such as), but it will cost 888ms, or it is very disturbing.

In general, the Search index is more important than indexing, because the search is many times, and the index is just a once

Search Index

Next you will create a program to search for the index created above:

Importorg.apache.lucene.document.Document;ImportOrg.apache.lucene.search.IndexSearcher;ImportOrg.apache.lucene.search.Query;ImportOrg.apache.lucene.search.ScoreDoc;ImportOrg.apache.lucene.search.TopDocs;Importorg.apache.lucene.store.FSDirectory;Importorg.apache.lucene.store.Directory;ImportOrg.apache.lucene.queryParser.QueryParser;Importorg.apache.lucene.queryParser.ParseException;ImportOrg.apache.lucene.analysis.standard.StandardAnalyzer;Importorg.apache.lucene.util.Version;ImportJava.io.File;Importjava.io.IOException; Public classSearchertest { Public Static voidMain (string[] args)throwsillegalargumentexception, IOException, parseexception {if(Args.length! = 2) {      Throw NewIllegalArgumentException ("Usage:java" + searchertest.class. GetName ()+ "<index dir> <query>"); } String Indexdir= Args[0];//1 parsing the input index pathString q = args[1];//2 Parsing the input query stringSearch (Indexdir, q); }   Public Static voidSearch (String indexdir, String q)throwsIOException, parseexception {Directory dir= Fsdirectory.open (NewFile (Indexdir));//3 Open Index fileIndexsearcher is =NewIndexsearcher (dir);//3Queryparser Parser=NewQueryparser (version.lucene_30,//4 Parsing the query string"Contents",//4                     NewStandardAnalyzer (//4VERSION.LUCENE_30));//4Query query = parser.parse (q);//4    LongStart =System.currenttimemillis (); Topdocs hits= Is.search (query, 10);//5 Search Index    LongEnd =System.currenttimemillis (); System.err.println ("Found" + hits.totalhits +//6 record index status"Document (s) (in" + (End-start) +//6"milliseconds) that matched query '" +//6Q + "':");//6     for(Scoredoc scoreDoc:hits.scoreDocs) {Document doc= Is.doc (Scoredoc.doc);//7 return matching textSystem.out.println (Doc.get ("FullPath"));//8 displaying matching file names} is.close (); //9 Close Indexsearcher  }}

Set parameters: E:\luceneinaction\index Patent

The results of the operation are as follows:

Found 8 Document (s) (in milliseconds) that matched query ' patent ':
E:\luceneinaction\lia2e\src\lia\meetlucene\data\cpl1.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\mozilla1.1.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\epl1.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\gpl3.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\apache2.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\gpl2.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\lpgl2.0.txt
E:\luceneinaction\lia2e\src\lia\meetlucene\data\lgpl2.1.txt

You can see that the speed is fast (12ms), the absolute path of the file is printed, because indexer stores the absolute path to the file.

Lucene Actual Build Index

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Lucene Actual Build Index

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Lucene Actual Build Index

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support