Lucene Learning Summary Two: The overall structure of Lucene

Source: Internet
Author: User

This article was reproduced from: http://www.cnblogs.com/forfuture1978/archive/2009/12/14/1623596.html

Lucene is generally:

    • An efficient, extensible, full-text retrieval library.
    • All implemented in Java, without configuration.
    • Only plain text files are supported for indexing (indexing) and search.
    • It is not responsible for extracting plain text files from other formats or fetching files from the network.

In the Lucene in action, the structure and process of Lucene

Description Lucene There are two procedures for indexing and searching, including index creation, indexing, and searching for three points.

Let's look at the various components of lucene in more detail:

    • Document object used for indexed documents representation.
    • IndexWriter through function adddocument adds a document to the index, implementing the process of creating an index.
    • Lucene the index is to apply the reverse index.
    • When the user has a request, Query The query statement that represents the user.
    • Indexsearcher Search by Function Search Lucene Index .
    • Indexsearcher Calculate term weight and score and returns the result to the user.
    • the collection of documents returned to the user is used Topdocscollector representation.

So how do you apply these components?

Let's go into the process of indexing and searching for calls to the Lucene API in more detail.

  • The
  • index process is as follows:
    • Create a indexwriter to write an index file with several parameters, Index_dir strong> is where the index file is stored, and Analyzer is used for lexical analysis and language processing of the document.
    • creates a document on behalf of the documents we want to index.
    • adds a different field to the document. We know that a document has a variety of information, such as title, author, modification time, content, etc. Different types of information are represented by different field , in this example, a total of two types of information are indexed, one is the file path, and the other is the file content. where FileReader , Src_file , represents the source file to be indexed.
    • indexwriter Call the function adddocument to write the index to the index folder.
  • The search process is as follows:
    • Indexreader reads the index information on the disk into memory, Index_dir is where the index file is stored.
    • Create Indexsearcher ready to search.
    • Create Analyer used for lexical analysis and language processing of query statements.
    • Create Queryparser used to parse a query statement.
    • Queryparser Call Parser parse the syntax, form a query syntax tree, put the.
    • Indexsearcher Call Search querying the syntax tree query Search to get results topscoredoccollector .

These are the simple calls to Lucene API functions.

However, after entering the Lucene source code, it is found that Lucene has a lot of packages, the relationship is complex.

However, it is not difficult to find that Lucene's various source code modules are an implementation of the normal index and search process.

This figure is the package structure for the Lucene implementation of the full-text retrieval process described in the previous section. (Refer to Http://www.lucene.com.cn/about.htm in the article "open source full text search engine Lucene")

A comparison will reveal the functions of each module.

    • Lucene The analysis module is mainly responsible for lexical analysis and language processing and the formation of term .
    • Lucene the index module is responsible for the creation of the index, there are IndexWriter .
    • Lucene the store modules are primarily responsible for reading and writing indexes.
    • Lucene the Queryparser mainly responsible for grammatical analysis.
    • Lucene the search The module is primarily responsible for searching the index.
    • Lucene the similarity module is responsible for the implementation of relevance scoring.

Understanding the entire structure of lucene, we can begin the Lucene journey of source code.

Lucene Learning Summary Two: The overall structure of Lucene

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.