Learning Lucene step by step -- (Step 3: indexing)

Source: Internet
Author: User

After learning about Lucene, let's dive into each module of Lucene. Here we mainly go deep into Lucene's index, that is, how to build the index process and concept.

Lucene and relational databases

From two perspectives, one is from the index aspect, and the other is fuzzy query. In fact, full-text search is classified as a Type of comparison.

1. Index comparison

Item Full-text retrieval Library (Lucene) Relational Database Service
Core functions Mainly text retrieval, which is difficult to insert, delete, and modify. It is suitable for querying large text blocks. It is very convenient to insert, delete, and modify data. It has special SQL commands, but it is inefficient for searching large text blocks.
Library Similar to databases, multiple databases can be created, and the storage location of each database can be different. You can create multiple databases. Generally, each database has control files and data files, which are complex.
Table There is no strict table concept. Lucene tables are made up of loose defined fields when they are stored. There are strict table structures, primary keys, and field types.
Record Because there is no strict table concept, the record is embodied as an object, and the corresponding class of the record is document. Record, which corresponds to the table structure.
Field The field type can only be text or date. The field generally does not support computation and has no function. The field class corresponds to the field class. Rich Field Types and powerful functions.
Query Result set In Lucene, the query result set class is hits, such as hits (doc1, doc2, doc3 ......) Use resultset in JDBC

2. Comparison of fuzzy search

Item Lucene full-text search Fuzzy Database Query
Index Create an inverted index for data in the data source, which is faster The database index cannot be used and all records need to be traversed for fuzzy match. Therefore, the query speed is reduced by multiple orders of magnitude.
Matching Effect Key features are split through the word meta matching and language analysis interfaces to support Chinese characters. Because fuzzy query is performed and the matching is inaccurate, irrelevant information or missing information may be found.
Matching degree There is a matching degree algorithm, and the results with a higher degree of matching are ranked first No matching algorithm. The number of times a keyword appears in the record is the same.
Result output A special algorithm is used to output the first 100 results with the highest matching degree. The result set is read in a buffer-type small batch, with low system overhead. Returns all result sets. When there are many matching entries, a large amount of memory is required to store these temporary result sets, causing high system overhead.
Customization You can use the API to customize the sorting rules that meet the search and sorting requirements. Not customizable
Applicability For Fuzzy search applications with high load, the indexing data volume is large, the speed requirement is fast, and the matching requirement is high. Low usage, simple fuzzy matching rules, or a small amount of data to be queried
Index creation process

The process of creating an index can be divided into the process of converting the original document into text, analyzing text, and saving the analyzed text to the index.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.