Lucene-full-text index and lucene incremental Index

Source: Internet
Author: User

Lucene-full-text index and lucene incremental Index

Recently, I have been familiar with lucene and many people have heard it before. With curiosity, I started to understand lucene. What impressed me the most is that it applies index tables in many ways, this tool is fast because it references a large number of index tables. Today, I will only talk about the example of my new school calendar and create an index.

In terms of lucene, Lucene is a function Library for Information Retrieval. You can use it to add indexing and searching functions to your applications. lucene users do not need to have a deep understanding of full-text search. Instead, they only need to use a class in the library. You can implement full-text search for your application. however, do not think Lucene is a search engine like google. Lucene is not even an application. It is just a tool and a Library. you can also think of it as a simple and easy-to-use API that encapsulates the index and search functions. with this set of APIS, you can do a lot of searching, and it is very convenient.

So what can lucene do? Lucene can index and search any data. lucene, regardless of the data source format, can be analyzed and used by Lucene as long as it can be converted into text. that is to say, whether it is MS word, Html, pdf or other forms of files, as long as you can extract text content from it, it can be used by Lucene. you can use Lucene to index and search for them. the following is a small example of an index generated by a query:

<Span style = "font-size: 14px;"> package com. jikexueyuan. study; import java. io. file; import java. io. IOException; import org. apache. lucene. analysis. analyzer; import org. apache. lucene. analysis. standard. standardAnalyzer; import org.apache.e.doc ument. document; import org.apache.e.doc ument. intField; import org.apache.e.doc ument. field. store; import org.apache.e.doc ument. stringField; import org. ap Ache. lucene. index. indexWriter; import org. apache. lucene. index. indexWriterConfig; import org. apache. lucene. index. indexWriterConfig. openMode; import org. apache. lucene. store. directory; import org. apache. lucene. store. FSDirectory; import org. apache. lucene. util. version; public class IndexCreate {/*** @ param args */public static void main (String [] args) {// TODO Auto-generated method stubAnalyzer analyzer = new StandardAnalyzer (Version. LUCENE_46); // StandardAnalyzer separates English words by space, punctuation, and so on. A Chinese character is considered as a word IndexWriterConfig indexWriterConfig = new IndexWriterConfig (Version. paie_46, analyzer); // you can use the specified word divider to split the written file (this allows you to search quickly), and then place words in the index file. IndexWriterConfig. setOpenMode (OpenMode. CREATE_OR_APPEND); Directory directory = null; IndexWriter indexWriter = null; try {directory = FSDirectory. open (new File ("E: // index/test"); // put the index inventory in this folder. Directory indicates that the index File is saved in an abstract class, two subclass FSDirectory indicates the file, and RAMDirectory indicates that the file is stored in the memory if (indexWriter. isLocked (directory) {indexWriter. unlock (directory);} indexWriter = new IndexWriter (directory, indexWriterConfig);} catch (Exception e) {e. printStackTrace ();} // Document document = new Document (); Document doc = new Document (); doc. add (new StringField ("id", "abcde", Store. YES); doc. add (new org.apache.e.doc ument. textField ("content", "geek College", Store. YES); doc. add (new IntField ("num", 1, Store. YES); try {indexWriter. addDocument (doc); // Add document (Insert)} catch (Exception e) {e. printStackTrace ();} Document doc1 = new Document (); doc1.add (new StringField ("id", "sdfsd", Store. YES); doc1.add (new org.apache.w.e.doc ument. textField ("content", "Lucene case", Store. YES); doc1.add (new IntField ("num", 1, Store. YES); try {indexWriter. addDocument (doc1);} catch (Exception e) {e. printStackTrace ();} try {indexWriter. commit (); indexWriter. close (); directory. close ();} catch (Exception e) {// TODO Auto-generated catch blocke. printStackTrace () ;}}</span>


The results generate a series of index-related files, such:

From the above example, we can see that the three elements required for creating an index are:

1. indexWriter

2. Directory

3. Anayzer

4. Document

5. Field

Sharing lucene will continue. We hope more and more people can work together!

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.