Lucene Study Summary ii: Overall Lucene Architecture

Source: Internet
Author: User

Lucene is:

    • An efficient, scalable, full-text retrieval library.
    • All are implemented in Java without configuration.
    • Only text file indexes and search are supported ).
    • It is not responsible for the process of extracting plain text files from files in other formats or capturing files from the network.

In Lucene in action, Lucene's architecture and process are as follows,

Description LuceneThere are two processes of indexing and searching, including index creation, indexing, and searching.

Let's take a closer look at Lucene's components:

    • Document Object for indexed documents.
    • IndexwriterAdddocumentAdd the document to the index to create the index.
    • LuceneThe application reverse index.
    • When a user has a request, queryThe query statement that represents the user.
    • IndexsearcherSearchSearch Lucene Index.
    • IndexsearcherCalculate Term WeightAnd scoreAnd return the result to the user.
    • Topdocscollector is used for the document set returned to the user..

So how to apply these components?

Let's further detail the indexing and searching process for Lucene API calls.

  • the indexing process is as follows:
    • Create an indexwriter used to write the index file. It has several parameters, index_dir is the location where the index file is stored. analyzer is used to perform lexical analysis and language processing on the document.
    • Create a document indicates the document to be indexed.
    • add different fields to the document. We know that a document contains multiple types of information, such as the question, author, modification time, and content. Different types of information are represented by different fields . In this example, two types of information are indexed, one is the file path, one is the file content. The filereader src_file indicates the source file to be indexed.
    • indexwriter call the adddocument function to write the index to the index folder.
  • The search process is as follows:
    • IndexreaderRead the index information on the disk to the memory, index_dirThe location where the index file is stored.
    • Create indexsearcherPrepare to search.
    • Create analyerIt is used for lexical analysis and Language Processing of query statements.
    • Create queryparserUsed for syntax analysis of query statements.
    • QueryparserCall parserPerform syntax analysis to form a query syntax tree and place it in the query.
    • IndexsearcherCall searchQuery the query syntax treeSearch. The result is topscoredoccollector..

These are simple calls to Lucene API functions.

HoweverSource codeThen, we found that Lucene had many packages, and the relationship was complicated.

However, it is not difficult to find that each of Lucene's source code modules is an implementation of common indexing and search processes.

This figure shows the Lucene package structure corresponding to the full-text search process described in the previous section. (See http://www.e.com.cn/about.htm.ArticleOpen SourceCodeFull-text search engine Lucene)

    • LuceneAnalysisThe module is mainly responsible for lexical analysis and Language Processing to form a term..
    • LuceneIndexThe module is mainly responsible for creating indexes, including indexwriter..
    • LuceneStoreThe module is mainly responsible for reading and writing indexes.
    • LuceneQueryparserMainly responsible for syntax analysis.
    • LuceneSearchThe module is mainly responsible for searching indexes.
    • LuceneSimilarityThe module is mainly responsible for scoring relevance.

After learning about the entire structure of Lucene, we can start the Lucene source code journey.

 

In addition:

Csdn this article link: http://blog.csdn.net/forfuture1978/archive/2009/10/30/4745802.aspx

Javaeye this article link: http://forfuture1978.javaeye.com/blog/546808

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.