Lucene Learning Summary Two: The overall structure of Lucene

Last Update:2014-12-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article was reproduced from: http://www.cnblogs.com/forfuture1978/archive/2009/12/14/1623596.html

Lucene is generally:

An efficient, extensible, full-text retrieval library.
All implemented in Java, without configuration.
Only plain text files are supported for indexing (indexing) and search.
It is not responsible for extracting plain text files from other formats or fetching files from the network.

In the Lucene in action, the structure and process of Lucene

Description Lucene There are two procedures for indexing and searching, including index creation, indexing, and searching for three points.

Let's look at the various components of lucene in more detail:

Document object used for indexed documents representation.
IndexWriter through function adddocument adds a document to the index, implementing the process of creating an index.
Lucene the index is to apply the reverse index.
When the user has a request, Query The query statement that represents the user.
Indexsearcher Search by Function Search Lucene Index .
Indexsearcher Calculate term weight and score and returns the result to the user.
the collection of documents returned to the user is used Topdocscollector representation.

So how do you apply these components?

Let's go into the process of indexing and searching for calls to the Lucene API in more detail.

The
index process is as follows:
- Create a indexwriter to write an index file with several parameters, Index_dir strong> is where the index file is stored, and Analyzer is used for lexical analysis and language processing of the document.
- creates a document on behalf of the documents we want to index.
- adds a different field to the document. We know that a document has a variety of information, such as title, author, modification time, content, etc. Different types of information are represented by different field , in this example, a total of two types of information are indexed, one is the file path, and the other is the file content. where FileReader , Src_file , represents the source file to be indexed.
- indexwriter Call the function adddocument to write the index to the index folder.
The search process is as follows:
- Indexreader reads the index information on the disk into memory, Index_dir is where the index file is stored.
- Create Indexsearcher ready to search.
- Create Analyer used for lexical analysis and language processing of query statements.
- Create Queryparser used to parse a query statement.
- Queryparser Call Parser parse the syntax, form a query syntax tree, put the.
- Indexsearcher Call Search querying the syntax tree query Search to get results topscoredoccollector .

These are the simple calls to Lucene API functions.

However, after entering the Lucene source code, it is found that Lucene has a lot of packages, the relationship is complex.

However, it is not difficult to find that Lucene's various source code modules are an implementation of the normal index and search process.

This figure is the package structure for the Lucene implementation of the full-text retrieval process described in the previous section. (Refer to Http://www.lucene.com.cn/about.htm in the article "open source full text search engine Lucene")

A comparison will reveal the functions of each module.

Lucene The analysis module is mainly responsible for lexical analysis and language processing and the formation of term .
Lucene the index module is responsible for the creation of the index, there are IndexWriter .
Lucene the store modules are primarily responsible for reading and writing indexes.
Lucene the Queryparser mainly responsible for grammatical analysis.
Lucene the search The module is primarily responsible for searching the index.
Lucene the similarity module is responsible for the implementation of relevance scoring.

Understanding the entire structure of lucene, we can begin the Lucene journey of source code.

Lucene Learning Summary Two: The overall structure of Lucene

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Lucene Learning Summary Two: The overall structure of Lucene

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Lucene Learning Summary Two: The overall structure of Lucene

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support