Lucene usage Summary

Source: Internet
Author: User
Tags createindex

1. Build Lucene Development Environment: Add lucene-core-2.9.1.jar package in classpath

2. Full-text search: Create an index file and search for an index.

3. Lucene index file Logical Structure
1) index is composed of several segments.
★2) A file is mapped into a document. A record in the database table is mapped into a document.
★3) A document consists of several fields: the attributes of the file (file path, file content) are mapped into a domain. A field of the record is mapped into a domain.
☆4) the domain is composed of several words (keywords): a string in the content of the file's attributes maps into a word.

4. Lucene Package Structure
1) analysis module: It is responsible for lexical analysis and Language Processing to form a term (word ). Provides some built-in Analyzer: standardanalyzer is the most commonly used
2) index module: reads and writes indexes. The indexwriter class that writes, merges, and optimizes the segment of the index file. Indexreader class that reads and deletes indexes.
3) store module: stores indexes. Various storage classes that provide indexes: fsdirectory and ramdirectory.
4) Document module: encapsulates the basic storage structure inside the index file. Such as the document class and field class.
5) Search module: searches indexes. Provides the indexsearcher class and various query classes, such as termquery and booleanquery.
6) queryparser module: Responsible for syntax analysis of query statements. Queryparser class for parsing query statements
7) util module: contains some common tool classes.

5. Create an index
1) indexwriter: Index writer
A) constructor:
Indexwriter (Directory D, analyzer A, indexwriter. maxfieldlength MFL)
If the index does not exist, it will be created. If the index exists, append it.
Indexwriter (Directory D, analyzer A, Boolean create, indexwriter. maxfieldlength MFL)
If create is true, the original index file is created if it does not exist and overwrites the existing one.
If create is false, an error is returned if the original index file does not exist. If it exists, an append is returned.
B) common methods:
Void adddocument (document DOC); // Add the specified document to the index writer.
Void IW. Close (); // close the index writer. Then, the index is written to the target storage location.

2) Directory: The index storage location.
A) File System: fsdirectory. Open (File file );
B) memory ramdirectory: New ramdirectory ();

3) Analyzer: analyzer.
A) standardanalyzer: a standard word divider. Use white spaces and punctuation marks for word segmentation. Use Word Segmentation for Chinese characters.
B) smartchineseanalyzer: intelligent Chinese Word divider. (Paie_home/contrib/analyzers/smartcn/lucene-smartcn-2.9.1.jar)
C) third-party Chinese Word Analyzer, such as paodinganalyzer and ikanalyzer

4) indexwriter. maxfieldlength: specifies the maximum length of the Domain value.
A) unlimited is unrestricted.
B) Limited. Value: 10000

5) document: the component of the index. A set of fields.
A) constructor: Document ();
B) Common method: void add (field F); // Add the specified domain to this document.

6) field: indicates an index field of a document.
A) constructor: field (string name, string value, field. Store. Yes, field. Index. Analyzed)
Name: the name of the domain, which can only be a string.
Value: the value of the field. It can only be a string.
Field. Store: Specifies whether the field value is stored or how to store. No (not stored), yes (stored), compress (stored after compression)
Field. Index: Specifies whether the field is indexed or how it is indexed. No (no index), analyzed (post-segmentation index), not_analyzed (Direct Index without word segmentation)
7) sample code:
// SRC: the directory where the destdir index is stored.
Public static void createindex (File SRC, file destdir ){
Analyzer analyzer = new standardanalyzer (version. analye_current); // create a syntax analyzer
Indexwriter iwriter = NULL;
Directory directory = NULL;
Try {
Directory = fsdirectory. Open (destdir); // store the index file to the disk directory.
// Create an indexwriter (the maximum length of the directory that stores the index file, analyzer, and field)
Iwriter = new indexwriter (directory, analyzer, true, indexwriter. maxfieldlength. Unlimited );
// Iwriter. setusecompoundfile (true); // use a composite File

Document Doc = new document (); // create a Document Object
// Use the file path as the "path" field: No word segmentation, index, save
Doc. Add (new field ("path", SRC. GetCanonicalPath (), field. Store. Yes, field. Index. not_analyzed ));

Stringbuilder sb = new stringbuilder ();
Bufferedreader BR = new bufferedreader (New filereader (SRC ));
For (string STR = NULL; (STR = Br. Readline ())! = NULL ;){
SB. append (STR). append (system. getproperty ("line. separator "));
}
// The file content is used as the "content" field: Word Segmentation, indexing, and storage
Doc. Add (new field ("contents", SB. tostring (), field. Store. Yes, field. Index. Analyzed ));

Iwriter. adddocument (DOC); // store the document in indexwriter
Iwriter. Optimize (); // optimize the index
} Catch (ioexception e ){
E. printstacktrace ();
} Finally {
If (iwriter! = NULL ){
Try {
Iwriter. Close (); // write the data in the memory to the file only when indexwriter is disabled.
} Catch (ioexception e ){
E. printstacktrace ();
}
}
If (directory! = NULL ){
Try {
Directory. Close (); // close the index storage directory
} Catch (ioexception e ){
E. printstacktrace ();
}
}
}
}
// SRC: the directory where the destdir index is stored.
Public static void createindex (File SRC, file destdir ){
Analyzer analyzer = new standardanalyzer (version. analye_current); // create a syntax analyzer
Indexwriter iwriter = NULL;
Directory directory = NULL;
Try {
Directory = fsdirectory. Open (destdir); // store the index file to the disk directory.
// Create an indexwriter (the maximum length of the directory that stores the index file, analyzer, and field)
Iwriter = new indexwriter (directory, analyzer, true, indexwriter. maxfieldlength. Unlimited );
// Iwriter. setusecompoundfile (true); // use a composite File

Document Doc = new document (); // create a Document Object
// Use the file path as the "path" field: No word segmentation, index, save
Doc. Add (new field ("path", SRC. GetCanonicalPath (), field. Store. Yes, field. Index. not_analyzed ));

Stringbuilder sb = new stringbuilder ();
Bufferedreader BR = new bufferedreader (New filereader (SRC ));
For (string STR = NULL; (STR = Br. Readline ())! = NULL ;){
SB. append (STR). append (system. getproperty ("line. separator "));
}
// The file content is used as the "content" field: Word Segmentation, indexing, and storage
Doc. Add (new field ("contents", SB. tostring (), field. Store. Yes, field. Index. Analyzed ));

Iwriter. adddocument (DOC); // store the document in indexwriter
Iwriter. Optimize (); // optimize the index
} Catch (ioexception e ){
E. printstacktrace ();
} Finally {
If (iwriter! = NULL ){
Try {
Iwriter. Close (); // write the data in the memory to the file only when indexwriter is disabled.
} Catch (ioexception e ){
E. printstacktrace ();
}
}
If (directory! = NULL ){
Try {
Directory. Close (); // close the index storage directory
} Catch (ioexception e ){
E. printstacktrace ();
}
}
}
}
 

6. query Indexes
1) indexsearcher: Index queryer
A) constructor: indexsearcher (directory path, Boolean readonly)
B) common methods:
Topdocs search (query, filter, int N); // execute the query. N indicates the maximum number of returned documents.
Document DOC (internal number of the int file); // obtain the document according to the internal number of the document
Void close (); // close the queryer
2) query: query object. Encapsulate the query string entered by the user into a query object that Lucene can recognize.
3) filter: the object used to overwrite the search results.
4) topdocs: indicates the query result set information object. It has two attributes:
A) totalhits: number of hits queried.
B) scoredocs: Query Result information. It contains the internal number (DOC) and score (score) of the qualified document ).
5) sample code:
// Key word: the keyword to be searched. Indexdir INDEX DIRECTORY
Public static void searcher (string keyword, file indexdir ){
Indexsearcher isearcher = NULL;
Directory directory = NULL;
Try {
Analyzer analyzer = new standardanalyzer (version. paie_current );
Directory = fsdirectory. Open (indexdir );

// Create a parser
Queryparser parser = new queryparser (version. paie_current, "contents", analyzer );
Query query = parser. parse (keyword); // obtain the query object

// Query query1 = new termquery (new term ("contents", keyword ));
// Query query2 = new termquery (new term ("contents", keyword2 ));
// Booleanquery query = new booleanquery ();
// Query. Add (query1, occur. shocould );
// Query. Add (query2, occur. shocould );

// Queryparser parser = new multifieldqueryparser (version. inclue_current, new string [] {"path", "contents"}, analyzer );
// Query = parser. parse (keyword );

Isearcher = new indexsearcher (directory, true); // create an index searcher
Topdocs Ts = isearcher. Search (query, null, 100); // execute the search to obtain the query result set object

Int totalhits = ts. totalhits; // gets the number of hits.
System. Out. println ("hits:" + totalhits );

Scoredoc [] hits = ts. scoredocs; // gets the hit document information object
For (INT I = 0; I Document hitdoc = isearcher.doc(hits? I =.doc); // obtain this document based on the internal number of the hit document
System. Out. println (hitdoc. getfield ("contents"). stringvalue (); // output the value of the specified domain in this document.
}
} Catch (ioexception e ){
E. printstacktrace ();
} Catch (parseexception e ){
E. printstacktrace ();
} Finally {
If (isearcher! = NULL ){
Try {
Isearcher. Close (); // close the searcher.
} Catch (ioexception e ){
E. printstacktrace ();
}
}
If (directory! = NULL ){
Try {
Directory. Close (); // close the index storage directory
} Catch (ioexception e ){
E. printstacktrace ();
}
}
}
}
// Key word: the keyword to be searched. Indexdir INDEX DIRECTORY
Public static void searcher (string keyword, file indexdir ){
Indexsearcher isearcher = NULL;
Directory directory = NULL;
Try {
Analyzer analyzer = new standardanalyzer (version. paie_current );
Directory = fsdirectory. Open (indexdir );

// Create a parser
Queryparser parser = new queryparser (version. paie_current, "contents", analyzer );
Query query = parser. parse (keyword); // obtain the query object

// Query query1 = new termquery (new term ("contents", keyword ));
// Query query2 = new termquery (new term ("contents", keyword2 ));
// Booleanquery query = new booleanquery ();
// Query. Add (query1, occur. shocould );
// Query. Add (query2, occur. shocould );

// Queryparser parser = new multifieldqueryparser (version. inclue_current, new string [] {"path", "contents"}, analyzer );
// Query = parser. parse (keyword );

Isearcher = new indexsearcher (directory, true); // create an index searcher
Topdocs Ts = isearcher. Search (query, null, 100); // execute the search to obtain the query result set object

Int totalhits = ts. totalhits; // gets the number of hits.
System. Out. println ("hits:" + totalhits );

Scoredoc [] hits = ts. scoredocs; // gets the hit document information object
For (INT I = 0; I Document hitdoc = isearcher.doc(hits? I =.doc); // obtain this document based on the internal number of the hit document
System. Out. println (hitdoc. getfield ("contents"). stringvalue (); // output the value of the specified domain in this document.
}
} Catch (ioexception e ){
E. printstacktrace ();
} Catch (parseexception e ){
E. printstacktrace ();
} Finally {
If (isearcher! = NULL ){
Try {
Isearcher. Close (); // close the searcher.
} Catch (ioexception e ){
E. printstacktrace ();
}
}
If (directory! = NULL ){
Try {
Directory. Close (); // close the index storage directory
} Catch (ioexception e ){
E. printstacktrace ();
}
}
}
}
 

7. delete an index
Indexwriter provides deletedocuments (Term term); // deletes all documents containing the specified term in the index file.
Indexreader also provides deletedocuments (Term term );

8. Update Indexes
Indexwriter provides updatedocument (Term term, document DOC); // you can delete the file before creating an index.

9. Common queryer
1) termquery: query by term (keyword. Constructor: termquery (term T)
Query query = new termquery (new term ("contents", keyword ));
Isearcher = new indexsearcher (fsdirectory. Open (indexdir), true );
Topdocs Ts = isearcher. Search (query, null, 100 );
Query query = new termquery (new term ("contents", keyword ));
Isearcher = new indexsearcher (fsdirectory. Open (indexdir), true );
Topdocs Ts = isearcher. Search (query, null, 100 );

2) booleanquery: Boolean query. Combine multiple schedulers.
Query query1 = new termquery (new term ("contents", keyword ));
Query query2 = new termquery (new term ("contents", keyword2 ));
Booleanquery query = new booleanquery ();
Query. Add (query1, occur. shocould );
Query. Add (query2, occur. shocould );

Isearcher = new indexsearcher (directory, true );

Topdocs Ts = isearcher. Search (query, null, 100 );
Query query1 = new termquery (new term ("contents", keyword ));
Query query2 = new termquery (new term ("contents", keyword2 ));
Booleanquery query = new booleanquery ();
Query. Add (query1, occur. shocould );
Query. Add (query2, occur. shocould );

Isearcher = new indexsearcher (directory, true );

Topdocs Ts = isearcher. Search (query, null, 100 );
 

3) multifieldqueryparser: Multi-field query.
Queryparser parser = new multifieldqueryparser (version. paie_current, new string [] {"path", "contents"}, analyzer );
Query query = parser. parse (keyword );
Isearcher = new indexsearcher (fsdirectory. Open (indexdir), true );
Topdocs Ts = isearcher. Search (query, null, 100 );
Queryparser parser = new multifieldqueryparser (version. paie_current, new string [] {"path", "contents"}, analyzer );
Query query = parser. parse (keyword );
Isearcher = new indexsearcher (fsdirectory. Open (indexdir), true );
Topdocs Ts = isearcher. Search (query, null, 100 );

10. highlighter: highlight the search results on the webpage.
1) Add contrib/highlighter/lucene-highlighter-2.9.1.jar in classpath
2) Sample pseudocode
Simplehtmlformatter SHF = new simplehtmlformatter ("<span style =" color: Red "mce_style =" color: Red ">", "</span> "); // The default value is <B> .. </B>
// Construct the splitter: Specify the highlighted format and the query Splitter
Highlighter = new highlighter (SHF, new queryscorer (query ));
// Set the block Splitter
Highlighter. settextfragmenter (New simplefragmenter (integer. max_value ));
String content = highlighter. getbestfragment (analyzer, "fieldname", "fieldvalue ");
Simplehtmlformatter SHF = new simplehtmlformatter ("<span style =" color: Red "mce_style =" color: Red ">", "</span> "); // The default value is <B> .. </B>
// Construct the splitter: Specify the highlighted format and the query Splitter
Highlighter = new highlighter (SHF, new queryscorer (query ));
// Set the block Splitter
Highlighter. settextfragmenter (New simplefragmenter (integer. max_value ));
String content = highlighter. getbestfragment (analyzer, "fieldname", "fieldvalue ");

11. Optimization
1) Use indexwriter
After the index is modified, flush () or close () is required to take effect.
2) Note the following before using indexsearcher:
Once enabled, the index to be added will not be searched.
Thread security. Only one instance is required for multiple threads.
3) Best practices
Multiple threads share one indexsearcher. indexsearcher can be re-opened only after the index is modified.
Multiple threads share one indexwriter and strictly synchronize the data
Asynchronously modifying indexes to improve performance (JMS)
Create a separate index directory for each document

12. Integrate Lucene in the emall project to search the full text of the product ID, name, and description.

13. Use compass to simplify Lucene operations.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.