Case Development Based on lucene: Creating indexes and lucene Indexes

Source: Internet
Author: User

Case Development Based on lucene: Creating indexes and lucene Indexes

Reprinted please indicate the source: http://blog.csdn.net/xiaojimanman/article/details/42872711

Starting from this blog, both API introduction and subsequent case development are based on javase4.3.1. For details about how to download javase4.3.1 and other Lucene versions, click here, lucene4.3.1 click here for the official API documentation.


Index creation demo

Before getting started, let's first look at a simple index creation demo program:

/*** @ Description: index creation demo */package com. lulei. lucene. study; import java. io. file; import org. apache. lucene. analysis. analyzer; import org. apache. lucene. analysis. standard. standardAnalyzer; import org.apache.e.doc ument. document; import org.apache.e.doc ument. field. store; import org.apache.e.doc ument. textField; import org. apache. lucene. index. indexWriter; import org. apache. lucene. index. indexWriterConfig; import org. apache. lucene. index. indexWriterConfig. openMode; import org. apache. lucene. store. directory; import org. apache. lucene. store. FSDirectory; import org. apache. lucene. util. version; public class IndexCreate {public static void main (String [] args) {// specifies the index Word Segmentation technology. The standard word segmentation Analyzer analyzer = new StandardAnalyzer (Version. required e_43); // indexwriter configuration information IndexWriterConfig indexWriterConfig = new IndexWriterConfig (Version. paie_43, analyzer); // index opening method. You can create an index without an index file, and enable indexWriterConfig if any. setOpenMode (OpenMode. CREATE_OR_APPEND); Directory directory = null; IndexWriter indexWrite = null; try {// specify the index hard disk storage path Directory = FSDirectory. open (new File ("D: // study/index/testindex"); // if the index is locked, unlock if (IndexWriter. isLocked (directory) {IndexWriter. unlock (directory);} // specifies the operation object indexWriteindexWrite = new IndexWriter (directory, indexWriterConfig);} catch (Exception e) {e. printStackTrace () ;}// create Document 1 Document doc1 = new Document (); // assign the "test title" to the name field ", store Domain value information doc1.add (new TextField ("name", "test title", Store. YES); // assign a "test title" to the content field to Store the doc1.add (new TextField ("content", "Test content", Store. YES); try {// write the document to the index indexWrite. addDocument (doc1);} catch (Exception e) {e. printStackTrace () ;}// create Document 2 Document doc2 = new Document (); doc2.add (new TextField ("name", "lucene-based case development: Index mathematical model ", store. YES); doc2.add (new TextField ("content", "lucene divides a document into several domains, and each domain is divided into several tokens, the document is converted to an n-dimensional space vector by the importance of the word element in the document, and the similarity between the two documents is calculated by calculating the cosine of the angle between the two vectors ", Store. YES); try {// write the document to the index indexWrite. addDocument (doc2);} catch (Exception e) {e. printStackTrace () ;}// submit the indexWrite operation. If not submitted, the previous operation will not be saved to the hard disk try {// This step consumes system resources, therefore, the commit operation requires a certain policy indexWrite. commit (); // close the resource indexWrite. close (); directory. close ();} catch (Exception e) {e. printStackTrace ();}}}
In the above program, a detailed comment has been made, and the role of each statement will not be described. Next let's take a look at the index file created after the main function is executed, such:


You can use the index viewing tool luke to easily view the content in the index, such:



From the above two figures, we can see that there are two documents in the index. The content field has 50 words, and the name field has 18 words. The index stores detailed information about the document.


Create index core class

In the process of creating an index, several core classes are used:IndexWriter,Directory,Analyzer,Document,Field.

IndexWriter

IndexWriter (index writing) is the core component in the indexing process. This class is responsible for creating new indexes or opening existing indexes and adding, deleting, and updating indexed document information to the indexes; indexWriter requires a certain amount of space to store indexes. This function can be completed by Directory.

Directory

The Directory class describes the storage location of Lucene indexes. It is an abstract class, and its subclass is responsible for specifying the index storage path. In the previous example, we used FSDirectory. open method to obtain the storage path of real files in the file system, and then pass them to the IndexWriter Class Construction Method in sequence.

Analyzer

The document information must be processed by Analyzer before being indexed. In the preceding example, standard word segmentation is used. In future blogs, we will introduce various word segmentation tools and application scenarios separately.

Document

The Document object structure is relatively simple. As a container that contains multiple Field objects, the Document in the above example contains two domain names and content.

Filed

Each document in the index contains one or more domains with different names. Each domain has a domain name, a corresponding domain value, and a set of options to precisely control the Lucene index operation on each domain value. During the search, the text of all fields is connected together and processed as a text field.


The preceding core classes are very important and commonly used in Lucene operations. For more information, see the official API documentation.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.