Lucene in Action "Hello Lucene World"

Last Update:2015-01-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Indexer:

ImportOrg.apache.lucene.index.IndexWriter;ImportOrg.apache.lucene.analysis.standard.StandardAnalyzer;Importorg.apache.lucene.document.Document;ImportOrg.apache.lucene.document.Field;Importorg.apache.lucene.store.FSDirectory;Importorg.apache.lucene.store.Directory;Importorg.apache.lucene.util.Version;ImportJava.io.File;ImportJava.io.FileFilter;Importjava.io.IOException;ImportJava.io.FileReader;//From Chapter 1/*** This code is originally written for * Erik ' s Lucene intro java.net article*/ Public classIndexer { Public Static voidMain (string[] args)throwsException {if(Args.length! = 2) {      Throw NewIllegalArgumentException ("Usage:java" + Indexer.class. GetName ()+ "<index dir> <data dir>"); } String Indexdir= Args[0];//1String datadir = args[1];//2    LongStart =System.currenttimemillis (); Indexer Indexer=NewIndexer (Indexdir); intnumindexed; Try{numindexed= Indexer.index (DataDir,NewTextfilesfilter ()); } finally{indexer.close (); }    LongEnd =System.currenttimemillis (); System.out.println ("Indexing" + Numindexed + "files took" + (End-start) + "milliseconds"); }  PrivateIndexWriter writer;  PublicIndexer (String Indexdir)throwsIOException {Directory dir= Fsdirectory.open (NewFile (Indexdir)); Writer=NewIndexWriter (dir,//3                 NewStandardAnalyzer (//3VERSION.LUCENE_30),//3                 true,//3IndexWriter.MaxFieldLength.UNLIMITED);//3  }   Public voidClose ()throwsIOException {writer.close (); //4  }   Public intIndex (String datadir, filefilter filter)throwsException {file[] files=NewFile (DataDir). Listfiles ();  for(File f:files) {if(!f.isdirectory () &&!f.ishidden () &&f.exists ()&&F.canread ()&&(Filter==NULL||filter.accept (f)))      {indexfile (f); }    }    returnWriter.numdocs ();//5  }  Private Static classTextfilesfilterImplementsFileFilter { Public BooleanAccept (File path) {returnPath.getname (). toLowerCase ()//6. EndsWith (". txt");//6    }  }  protectedDocument GetDocument (File f)throwsException {Document doc=NewDocument (); Doc.add (NewField ("Contents",NewFileReader (f)));//7Doc.add (NewField ("filename", F.getname (),//8Field.Store.YES, Field.Index.NOT_ANALYZED));//8Doc.add (NewField ("FullPath", F.getcanonicalpath (),//9Field.Store.YES, Field.Index.NOT_ANALYZED));//9    returnDoc; }  Private voidIndexfile (File f)throwsException {System.out.println ("Indexing" +F.getcanonicalpath ()); Document Doc=getdocument (f);                              Writer.adddocument (DOC); //Ten  }}

Index procedure Core class:

IndexWriter

Responsible for creating new or open existing indexes and adding, deleting, or updating indexed document information to the index, typically via the constructor to the directory and Analyzer

Directory

Abstract class that describes where the index is stored

Analyzer

Responsible for extracting the lexical units from the indexed text, only the plain text files, if not plain text, need to be converted first (e.g. using Tika)

Document

The Document object represents a collection of some field

Field

Lucene handles only the text that appears as a field extracted from a binary document, and the document's metadata is stored and indexed separately as a different domain of the document

Digression: The Lucene kernel itself handles only java.lang.String, Java.io.Reader, and local numeric types (int, float, and so on)

Searcher:

Importorg.apache.lucene.document.Document;ImportOrg.apache.lucene.search.IndexSearcher;ImportOrg.apache.lucene.search.Query;ImportOrg.apache.lucene.search.ScoreDoc;ImportOrg.apache.lucene.search.TopDocs;Importorg.apache.lucene.store.FSDirectory;Importorg.apache.lucene.store.Directory;ImportOrg.apache.lucene.queryParser.QueryParser;Importorg.apache.lucene.queryParser.ParseException;ImportOrg.apache.lucene.analysis.standard.StandardAnalyzer;Importorg.apache.lucene.util.Version;ImportJava.io.File;Importjava.io.IOException;//From Chapter 1/*** This code is originally written for * Erik ' s Lucene intro java.net article*/ Public classSearcher { Public Static voidMain (string[] args)throwsillegalargumentexception, IOException, parseexception {if(Args.length! = 2) {      Throw NewIllegalArgumentException ("Usage:java" + Searcher.class. GetName ()+ "<index dir> <query>"); } String Indexdir= Args[0];//1String q = args[1];//2Search (Indexdir, q); }   Public Static voidSearch (String indexdir, String q)throwsIOException, parseexception {Directory dir= Fsdirectory.open (NewFile (Indexdir));//3Indexsearcher is =NewIndexsearcher (dir);//3Queryparser Parser=NewQueryparser (version.lucene_30,//4"Contents",//4                     NewStandardAnalyzer (//4VERSION.LUCENE_30));//4Query query = parser.parse (q);//4    LongStart =System.currenttimemillis (); Topdocs hits= Is.search (query, 10);//5    LongEnd =System.currenttimemillis (); System.err.println ("Found" + hits.totalhits +//6"Document (s) (in" + (End-start) +//6"milliseconds) that matched query '" +//6Q + "':");//6     for(Scoredoc scoreDoc:hits.scoreDocs) {Document doc= Is.doc (Scoredoc.doc);//7System.out.println (Doc.get ("FullPath"));//8} is.close (); //9  }}

Search Process Core classes:

Indexsearcher

Used to search for an index created by IndexWriter, the constructor needs to pass in the directory to get the index created. Then provide a search method

Term

The term object is the basic unit of the search (similar to field)

New Termquery (new term ("contents", "Lucene"= Searcher.search (q,10);

Query

Query is the base class for all query classes, such as Termquery, Booleanquery

Termquery

Termquery is one of the most basic and simple query types to match the document containing the specified item in the specified domain

Topdocs

is a simple pointer container that accommodates the query results

Soup can be kept in a tidy, reproduced annotated

Lucene in Action "Hello Lucene World"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Lucene in Action "Hello Lucene World"

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Lucene in Action "Hello Lucene World"

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support