Lucene2.9.1 Usage SummaryCategory: Search engine 2009-11-26 15:15 413 people read reviews (1) Favorite reports
"++yong's blog Address: Http://blog.csdn.net/qjyong"
The use of the open source full-Text Search Toolkit Lucene2.9.1.
1. Build Lucene development environment: Add Lucene-core-2.9.1.jar Package to Classpath
2. Two works for full-text search: Index file, search index.
3. Lucene's index file logical structure
1) indexing (index) consists of several blocks (fragments) (Segment)
★2) blocks consist of several documents: a file is mapped into a document. A record in a database table is mapped into a single document.
★3) documents are made up of several fields (field): The attributes of the file (the file path, the contents of the file) are mapped into a single domain. A field of a record is mapped into a domain.
☆4) fields consist of several words (keywords) (term): A string in the contents of a file's properties is mapped into a single word.
4. Lucene Package Structure
1) Analysis module: Responsible for lexical analyses and language processing and the formation of term (word). Provides a number of built-in analyzers: The most common is StandardAnalyzer
2) Index module: Responsible for reading and writing index. The IndexWriter class that writes, merges, and optimizes the segment of the index file. The Indexreader class for read and delete operations on the index.
3) Store module: Responsible for the storage of the index. Various storage classes that provide indexes: Fsdirectory,ramdirectory and so on.
4) Document module: the underlying storage structure encapsulation within the index file. such as: Document class and Field class.
5) Search module: Responsible for searching the index. The index Finder Indexsearcher class and various query classes are provided, such as Termquery, Booleanquery, and so on.
6) Queryparser module: Responsible for syntax analysis of query statements. Provides a Queryparser class for parsing query statements
7) Util module: Contains some common tools classes.
5. Create an index
1) IndexWriter: index writer
A) Construction method:
IndexWriter (Directory D, Analyzer A, indexwriter.maxfieldlength MFL)
If the index does not exist, it is created. Append if the index exists.
IndexWriter (Directory D, Analyzer A, Boolean create, Indexwriter.maxfieldlength MFL)
When Create is true, the original index file does not exist on creation, and exists on the overwrite.
When Create is false, the original index file does not exist on the error, the existence is appended.
b) Common methods:
void Adddocument (Document doc); Add the specified document to the index writer
void Iw.close (); Closes the index writer, at which point the index is written to the destination store
2) Directory: Index place of storage.
A) Document System: FSDirectory:FSDirectory.open (file file);
b) Memory ramdirectory:new ramdirectory ();
3) Analyzer: Word breaker.
A) StandardAnalyzer: Standard word breaker. Use blank in English, punctuation marks for word segmentation. Word segmentation is used for Chinese.
b) Smartchineseanalyzer: Intelligent Chinese word breaker. (Lucene_home/contrib/analyzers/smartcn/lucene-smartcn-2.9.1.jar)
C) third-party Chinese word breakers: such as Paodinganalyzer, Ikanalyzer
4) Indexwriter.maxfieldlength: Specifies the maximum length of the domain value.
A) UNLIMITED unrestricted.
b) There are restrictions on LIMITED. Value of 10000
5) Document: The constituent unit of the index. A set of field sets.
A) Construction method: Document ();
b) Common method: Void Add (Field f); Add the specified field to this document
6) Field: field that represents an index field for the document.
A) Construction method: Field (string name, string value, Field.Store.YES, Field.Index.ANALYZED)
Name: The names of the fields, which can only be strings.
Value: The values of the field, which can only be strings.
Field.store: Specifies whether the value of field is stored or stored. No (not stored), YES (storage), COMPRESS (stored after compression)
Field.index: Specifies whether the field is indexed or how it is indexed. No (not indexed), ANALYZED (post-segmentation index), not_analyzed (non-participle direct index)
7) Example code://SRC the file to be indexed, the directory where the Destdir index resides PublicStaticvoidCreateIndex (file src, file destDir) {Analyzer Analyzer =NewStandardAnalyzer (version.lucene_current); Create a parser IndexWriter iwriter =NULL; Directory directory =NULL;Try{directory = Fsdirectory.open (destDir);////Save index file to disk directory//Create a IndexWriter (the directory where the index file is stored, the maximum length of the parser, field) Iwriter =NewIndexWriter (directory, Analyzer,true, IndexWriter.MaxFieldLength.UNLIMITED); Iwriter.setusecompoundfile (TRUE);//use compound file Doc =NewDocument (); Create a Document Object//Take the file path as the "path" field: No participle, index, save Doc.add (NewField ("Path", Src.getcanonicalpath (), Field.Store.YES, Field.Index.NOT_ANALYZED)); StringBuilder SB =NewStringBuilder (); BufferedReader br =NewBufferedReader (NewFileReader (SRC)); for(String str =NULL; (str = br.readline ())! =NULL;) {sb.append (str). Append (System.getproperty ("Line.separator")); }//File contents as "Content" field: Word breaker, index, save Doc.add (NewField ("Contents", Sb.tostring (), Field.Store.YES, Field.Index.ANALYZED)); Iwriter.adddocument (DOC); Store the document in IndexWriter iwriter.optimize (); Optimize the index}Catch(IOException e) {E.printstacktrace (); }finally{if(Iwriter! =NULL) {Try{iwriter.close ();///When IndexWriter is closed, the data in memory is written to a file}Catch(IOException e) {E.printstacktrace (); } }if(Directory! =NULL) {Try{directory.close ();///Close index Store directory}Catch(IOException e) {E.printstacktrace (); } } } }
SRC file to create the index, destdir the directory where the index resides PublicStaticvoidCreateIndex (file src, file destDir) {Analyzer Analyzer =NewStandardAnalyzer (version.lucene_current); Create a parser IndexWriter iwriter =NULL; Directory directory =NULL;Try{directory = Fsdirectory.open (destDir);////Save index file to disk directory//Create a IndexWriter (the directory where the index file is stored, the maximum length of the parser, field) Iwriter =NewIndexWriter (directory, Analyzer,true, IndexWriter.MaxFieldLength.UNLIMITED); Iwriter.setusecompoundfile (TRUE);//use compound file Doc =NewDocument (); Create a Document Object//Take the file path as the "path" field: No participle, index, save Doc.add (NewField ("Path", Src.getcanonicalpath (), Field.Store.YES, Field.Index.NOT_ANALYZED)); StringBuilder SB =NewStringBuilder (); BufferedReader br =NewBufferedReader (NewFileReader (SRC)); for(String str =NULL; (str = br.readline ())! =NULL;) {sb.append (str). Append (System.getproperty ("Line.separator")); }//File contents as "Content" field: Word breaker, index, save Doc.add (NewField ("Contents", Sb.tostring (), Field.Store.YES, Field.Index.ANALYZED)); Iwriter.adddocument (DOC); Store the document in IndexWriter iwriter.optimize (); Optimize the index}Catch(IOException e) {E.printstacktrace (); }finally{if(Iwriter! =NULL) {Try{iwriter.close ();///When IndexWriter is closed, the data in memory is written to a file}Catch(IOException e) {E.printstacktrace (); } }if(Directory! =NULL) {Try{directory.close ();///Close index Store directory}Catch (ioexception e) { e.printstacktrace (); } } } } // SRC to create an indexed file, the DestDir index holds the directory public static void CreateIndex (file src, file destDir) {Analyzer Analyzer = new Standardanalyze R (version.lucene_current); Create a parser indexwriter iwriter = null; Directory directory = null; try {directory = Fsdirectory.open (destDir);//Save index file to disk directory//Create a IndexWriter (the directory that holds the index file, the parser, the maximum length of the field) Iwriter = new IndexWriter (Directory, analyzer,true, IndexWriter.MaxFieldLength.UNLIMITED); Iwriter.setusecompoundfile (TRUE);//use compound file document doc = new document (); Create a Document Object//Take the file path as the "path" field: No participle, index, save Doc.add ("path", Src.getcanonicalpath (), Field.Store.YES, Field.Index.NOT_ANALYZED)); StringBuilder sb = new StringBuilder (); BufferedReader br = new BufferedReader (new FileReader (SRC)); for (String str = null; (str = br.readline ())!=null;) {SB.APpend (str). Append (System.getproperty ("Line.separator")); }//File contents as "Content" field: Participle, index, save Doc.add ("Contents", Sb.tostring (), Field.Store.YES, Field.Index.ANALYZED)); Iwriter.adddocument (DOC); Store the document in IndexWriter iwriter.optimize (); Optimize the index} catch (IOException e) {e.printstacktrace ();} finally {if (iwriter! = null) {try {iwriter.close ();//Close IND Exwriter writes the In-memory data to a file} catch (IOException e) {e.printstacktrace ();}} if (directory = null) {try {directory.close ();///Close index holding directory} catch (IOException e) {e.printstacktrace ();}}} }
6. Querying the Index
1) Indexsearcher: Index finder
A) constructor: Indexsearcher (Directory path, Boolean readOnly)
b) Common methods:
Topdocs Search (query query, filter filter, int n); Executes the query. n refers to the maximum number of document returned.
Document DOC (int file internal number); Gets the document based on its internal number
void Close (); Close the Finder
2) Query: Queries the object. Encapsulates a user-entered query string into a query object that Lucene can recognize.
3) Filter: The object that is used for the search results.
4) Topdocs: Represents the query result set information object. It has two properties:
A) totalhits: query hit number.
b) Scoredocs: query result information. It contains the internal number (DOC) and Rating (score) of the document that meets the criteria.
5) Example code://keyword the keyword to search for. Indexdir index-Stored directory PublicStaticvoidSearcher (String keyword, File indexdir) {Indexsearcher Isearcher =NULL; Directory directory =NULL;Try{Analyzer Analyzer =NewStandardAnalyzer (version.lucene_current); Directory = Fsdirectory.open (Indexdir); Create parser Queryparser parser =New queryparser (version.lucene_current, "Contents", analyzer); Query Query = parser.parse (keyword);//Get Query object // query query1 = new termquery (New term ("Contents", keyword)); // query query2 = new termquery (New term ("contents", KEYWORD2)); // booleanquery query = new booleanquery (); // query.add (query1, occur.should); // query.add (query2, occur.should); // queryparser parser = new Multifieldqueryparser (version.lucene_current, new string[]{"path", "contents"}, analyzer); // query query = parser.parse (keyword); &nbsP isearcher = NewIndexsearcher (Directory,true); CREATE INDEX finder Topdocs ts = isearcher.search (query,NULL, 100); Perform a search to get the query result set objectintTotalhits = ts.totalhits; Get hit number System.out.println ("Hits:" + totalhits); Scoredoc[] hits = Ts.scoredocs; Gets the document information object for the hit for(inti = 0; i < hits.length; i++) {Document Hitdoc = Isearcher.doc (Hits[i].doc);//Get the document based on the internal number of the hit document SYSTEM.OUT.PRINTLN (Hitdoc.get Field ("Contents"). StringValue ()); Outputs the value of the specified field for this document}}Catch(IOException e) {E.printstacktrace (); }Catch(ParseException e) {E.printstacktrace (); }finally{if(Isearcher! =NULL) {Try{isearcher.close ();///Close Finder}Catch(IOException e) {E.printstacktrace (); } }