Lucene Study Summary ii: Overall Lucene Architecture

Last Update:2018-12-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Lucene is:

An efficient, scalable, full-text retrieval library.
All are implemented in Java without configuration.
Only text file indexes and search are supported ).
It is not responsible for the process of extracting plain text files from files in other formats or capturing files from the network.

In Lucene in action, Lucene's architecture and process are as follows,

Description LuceneThere are two processes of indexing and searching, including index creation, indexing, and searching.

Let's take a closer look at Lucene's components:

Document Object for indexed documents.
IndexwriterAdddocumentAdd the document to the index to create the index.
LuceneThe application reverse index.
When a user has a request, queryThe query statement that represents the user.
IndexsearcherSearchSearch Lucene Index.
IndexsearcherCalculate Term WeightAnd scoreAnd return the result to the user.
Topdocscollector is used for the document set returned to the user..

So how to apply these components?

Let's further detail the indexing and searching process for Lucene API calls.

the indexing process is as follows:
- Create an indexwriter used to write the index file. It has several parameters, index_dir is the location where the index file is stored. analyzer is used to perform lexical analysis and language processing on the document.
- Create a document indicates the document to be indexed.
- add different fields to the document. We know that a document contains multiple types of information, such as the question, author, modification time, and content. Different types of information are represented by different fields . In this example, two types of information are indexed, one is the file path, one is the file content. The filereader src_file indicates the source file to be indexed.
- indexwriter call the adddocument function to write the index to the index folder.
The search process is as follows:
- IndexreaderRead the index information on the disk to the memory, index_dirThe location where the index file is stored.
- Create indexsearcherPrepare to search.
- Create analyerIt is used for lexical analysis and Language Processing of query statements.
- Create queryparserUsed for syntax analysis of query statements.
- QueryparserCall parserPerform syntax analysis to form a query syntax tree and place it in the query.
- IndexsearcherCall searchQuery the query syntax treeSearch. The result is topscoredoccollector..

These are simple calls to Lucene API functions.

HoweverSource codeThen, we found that Lucene had many packages, and the relationship was complicated.

However, it is not difficult to find that each of Lucene's source code modules is an implementation of common indexing and search processes.

This figure shows the Lucene package structure corresponding to the full-text search process described in the previous section. (See http://www.e.com.cn/about.htm.ArticleOpen SourceCodeFull-text search engine Lucene)

LuceneAnalysisThe module is mainly responsible for lexical analysis and Language Processing to form a term..
LuceneIndexThe module is mainly responsible for creating indexes, including indexwriter..
LuceneStoreThe module is mainly responsible for reading and writing indexes.
LuceneQueryparserMainly responsible for syntax analysis.
LuceneSearchThe module is mainly responsible for searching indexes.
LuceneSimilarityThe module is mainly responsible for scoring relevance.

After learning about the entire structure of Lucene, we can start the Lucene source code journey.

In addition:

Csdn this article link: http://blog.csdn.net/forfuture1978/archive/2009/10/30/4745802.aspx

Javaeye this article link: http://forfuture1978.javaeye.com/blog/546808

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Lucene Study Summary ii: Overall Lucene Architecture

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Lucene Study Summary ii: Overall Lucene Architecture

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support