Lucene in Action "Hello Lucene World"

Source: Internet
Author: User

Indexer:

ImportOrg.apache.lucene.index.IndexWriter;ImportOrg.apache.lucene.analysis.standard.StandardAnalyzer;Importorg.apache.lucene.document.Document;ImportOrg.apache.lucene.document.Field;Importorg.apache.lucene.store.FSDirectory;Importorg.apache.lucene.store.Directory;Importorg.apache.lucene.util.Version;ImportJava.io.File;ImportJava.io.FileFilter;Importjava.io.IOException;ImportJava.io.FileReader;//From Chapter 1/*** This code is originally written for * Erik ' s Lucene intro java.net article*/ Public classIndexer { Public Static voidMain (string[] args)throwsException {if(Args.length! = 2) {      Throw NewIllegalArgumentException ("Usage:java" + Indexer.class. GetName ()+ "<index dir> <data dir>"); } String Indexdir= Args[0];//1String datadir = args[1];//2    LongStart =System.currenttimemillis (); Indexer Indexer=NewIndexer (Indexdir); intnumindexed; Try{numindexed= Indexer.index (DataDir,NewTextfilesfilter ()); } finally{indexer.close (); }    LongEnd =System.currenttimemillis (); System.out.println ("Indexing" + Numindexed + "files took" + (End-start) + "milliseconds"); }  PrivateIndexWriter writer;  PublicIndexer (String Indexdir)throwsIOException {Directory dir= Fsdirectory.open (NewFile (Indexdir)); Writer=NewIndexWriter (dir,//3                 NewStandardAnalyzer (//3VERSION.LUCENE_30),//3                 true,//3IndexWriter.MaxFieldLength.UNLIMITED);//3  }   Public voidClose ()throwsIOException {writer.close (); //4  }   Public intIndex (String datadir, filefilter filter)throwsException {file[] files=NewFile (DataDir). Listfiles ();  for(File f:files) {if(!f.isdirectory () &&!f.ishidden () &&f.exists ()&&F.canread ()&&(Filter==NULL||filter.accept (f)))      {indexfile (f); }    }    returnWriter.numdocs ();//5  }  Private Static classTextfilesfilterImplementsFileFilter { Public BooleanAccept (File path) {returnPath.getname (). toLowerCase ()//6. EndsWith (". txt");//6    }  }  protectedDocument GetDocument (File f)throwsException {Document doc=NewDocument (); Doc.add (NewField ("Contents",NewFileReader (f)));//7Doc.add (NewField ("filename", F.getname (),//8Field.Store.YES, Field.Index.NOT_ANALYZED));//8Doc.add (NewField ("FullPath", F.getcanonicalpath (),//9Field.Store.YES, Field.Index.NOT_ANALYZED));//9    returnDoc; }  Private voidIndexfile (File f)throwsException {System.out.println ("Indexing" +F.getcanonicalpath ()); Document Doc=getdocument (f);                              Writer.adddocument (DOC); //Ten  }}

Index procedure Core class:

IndexWriter

Responsible for creating new or open existing indexes and adding, deleting, or updating indexed document information to the index, typically via the constructor to the directory and Analyzer

Directory

Abstract class that describes where the index is stored

Analyzer

Responsible for extracting the lexical units from the indexed text, only the plain text files, if not plain text, need to be converted first (e.g. using Tika)

Document

The Document object represents a collection of some field

Field

Lucene handles only the text that appears as a field extracted from a binary document, and the document's metadata is stored and indexed separately as a different domain of the document

Digression: The Lucene kernel itself handles only java.lang.String, Java.io.Reader, and local numeric types (int, float, and so on)

Searcher:

Importorg.apache.lucene.document.Document;ImportOrg.apache.lucene.search.IndexSearcher;ImportOrg.apache.lucene.search.Query;ImportOrg.apache.lucene.search.ScoreDoc;ImportOrg.apache.lucene.search.TopDocs;Importorg.apache.lucene.store.FSDirectory;Importorg.apache.lucene.store.Directory;ImportOrg.apache.lucene.queryParser.QueryParser;Importorg.apache.lucene.queryParser.ParseException;ImportOrg.apache.lucene.analysis.standard.StandardAnalyzer;Importorg.apache.lucene.util.Version;ImportJava.io.File;Importjava.io.IOException;//From Chapter 1/*** This code is originally written for * Erik ' s Lucene intro java.net article*/ Public classSearcher { Public Static voidMain (string[] args)throwsillegalargumentexception, IOException, parseexception {if(Args.length! = 2) {      Throw NewIllegalArgumentException ("Usage:java" + Searcher.class. GetName ()+ "<index dir> <query>"); } String Indexdir= Args[0];//1String q = args[1];//2Search (Indexdir, q); }   Public Static voidSearch (String indexdir, String q)throwsIOException, parseexception {Directory dir= Fsdirectory.open (NewFile (Indexdir));//3Indexsearcher is =NewIndexsearcher (dir);//3Queryparser Parser=NewQueryparser (version.lucene_30,//4"Contents",//4                     NewStandardAnalyzer (//4VERSION.LUCENE_30));//4Query query = parser.parse (q);//4    LongStart =System.currenttimemillis (); Topdocs hits= Is.search (query, 10);//5    LongEnd =System.currenttimemillis (); System.err.println ("Found" + hits.totalhits +//6"Document (s) (in" + (End-start) +//6"milliseconds) that matched query '" +//6Q + "':");//6     for(Scoredoc scoreDoc:hits.scoreDocs) {Document doc= Is.doc (Scoredoc.doc);//7System.out.println (Doc.get ("FullPath"));//8} is.close (); //9  }}

Search Process Core classes:

Indexsearcher

Used to search for an index created by IndexWriter, the constructor needs to pass in the directory to get the index created. Then provide a search method

Term

The term object is the basic unit of the search (similar to field)

New Termquery (new term ("contents", "Lucene"= Searcher.search (q,10);

Query

Query is the base class for all query classes, such as Termquery, Booleanquery

Termquery

Termquery is one of the most basic and simple query types to match the document containing the specified item in the specified domain

Topdocs

is a simple pointer container that accommodates the query results

Soup can be kept in a tidy, reproduced annotated

Lucene in Action "Hello Lucene World"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.