Case Development Based on lucene: Creating indexes and lucene Indexes

Last Update:2015-01-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reprinted please indicate the source: http://blog.csdn.net/xiaojimanman/article/details/42872711

Starting from this blog, both API introduction and subsequent case development are based on javase4.3.1. For details about how to download javase4.3.1 and other Lucene versions, click here, lucene4.3.1 click here for the official API documentation.

Index creation demo

Before getting started, let's first look at a simple index creation demo program:

/*** @ Description: index creation demo */package com. lulei. lucene. study; import java. io. file; import org. apache. lucene. analysis. analyzer; import org. apache. lucene. analysis. standard. standardAnalyzer; import org.apache.e.doc ument. document; import org.apache.e.doc ument. field. store; import org.apache.e.doc ument. textField; import org. apache. lucene. index. indexWriter; import org. apache. lucene. index. indexWriterConfig; import org. apache. lucene. index. indexWriterConfig. openMode; import org. apache. lucene. store. directory; import org. apache. lucene. store. FSDirectory; import org. apache. lucene. util. version; public class IndexCreate {public static void main (String [] args) {// specifies the index Word Segmentation technology. The standard word segmentation Analyzer analyzer = new StandardAnalyzer (Version. required e_43); // indexwriter configuration information IndexWriterConfig indexWriterConfig = new IndexWriterConfig (Version. paie_43, analyzer); // index opening method. You can create an index without an index file, and enable indexWriterConfig if any. setOpenMode (OpenMode. CREATE_OR_APPEND); Directory directory = null; IndexWriter indexWrite = null; try {// specify the index hard disk storage path Directory = FSDirectory. open (new File ("D: // study/index/testindex"); // if the index is locked, unlock if (IndexWriter. isLocked (directory) {IndexWriter. unlock (directory);} // specifies the operation object indexWriteindexWrite = new IndexWriter (directory, indexWriterConfig);} catch (Exception e) {e. printStackTrace () ;}// create Document 1 Document doc1 = new Document (); // assign the "test title" to the name field ", store Domain value information doc1.add (new TextField ("name", "test title", Store. YES); // assign a "test title" to the content field to Store the doc1.add (new TextField ("content", "Test content", Store. YES); try {// write the document to the index indexWrite. addDocument (doc1);} catch (Exception e) {e. printStackTrace () ;}// create Document 2 Document doc2 = new Document (); doc2.add (new TextField ("name", "lucene-based case development: Index mathematical model ", store. YES); doc2.add (new TextField ("content", "lucene divides a document into several domains, and each domain is divided into several tokens, the document is converted to an n-dimensional space vector by the importance of the word element in the document, and the similarity between the two documents is calculated by calculating the cosine of the angle between the two vectors ", Store. YES); try {// write the document to the index indexWrite. addDocument (doc2);} catch (Exception e) {e. printStackTrace () ;}// submit the indexWrite operation. If not submitted, the previous operation will not be saved to the hard disk try {// This step consumes system resources, therefore, the commit operation requires a certain policy indexWrite. commit (); // close the resource indexWrite. close (); directory. close ();} catch (Exception e) {e. printStackTrace ();}}}

In the above program, a detailed comment has been made, and the role of each statement will not be described. Next let's take a look at the index file created after the main function is executed, such:

You can use the index viewing tool luke to easily view the content in the index, such:

From the above two figures, we can see that there are two documents in the index. The content field has 50 words, and the name field has 18 words. The index stores detailed information about the document.

Create index core class

In the process of creating an index, several core classes are used:IndexWriter,Directory,Analyzer,Document,Field.

IndexWriter

IndexWriter (index writing) is the core component in the indexing process. This class is responsible for creating new indexes or opening existing indexes and adding, deleting, and updating indexed document information to the indexes; indexWriter requires a certain amount of space to store indexes. This function can be completed by Directory.

Directory

The Directory class describes the storage location of Lucene indexes. It is an abstract class, and its subclass is responsible for specifying the index storage path. In the previous example, we used FSDirectory. open method to obtain the storage path of real files in the file system, and then pass them to the IndexWriter Class Construction Method in sequence.

Analyzer

The document information must be processed by Analyzer before being indexed. In the preceding example, standard word segmentation is used. In future blogs, we will introduce various word segmentation tools and application scenarios separately.

Document

The Document object structure is relatively simple. As a container that contains multiple Field objects, the Document in the above example contains two domain names and content.

Filed

Each document in the index contains one or more domains with different names. Each domain has a domain name, a corresponding domain value, and a set of options to precisely control the Lucene index operation on each domain value. During the search, the text of all fields is connected together and processed as a text field.

The preceding core classes are very important and commonly used in Lucene operations. For more information, see the official API documentation.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Case Development Based on lucene: Creating indexes and lucene Indexes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Case Development Based on lucene: Creating indexes and lucene Indexes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support