Learning Lucene step by step -- (Step 3: indexing)

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

After learning about Lucene, let's dive into each module of Lucene. Here we mainly go deep into Lucene's index, that is, how to build the index process and concept.

Lucene and relational databases

From two perspectives, one is from the index aspect, and the other is fuzzy query. In fact, full-text search is classified as a Type of comparison.

1. Index comparison

Item	Full-text retrieval Library (Lucene)	Relational Database Service
Core functions	Mainly text retrieval, which is difficult to insert, delete, and modify. It is suitable for querying large text blocks.	It is very convenient to insert, delete, and modify data. It has special SQL commands, but it is inefficient for searching large text blocks.
Library	Similar to databases, multiple databases can be created, and the storage location of each database can be different.	You can create multiple databases. Generally, each database has control files and data files, which are complex.
Table	There is no strict table concept. Lucene tables are made up of loose defined fields when they are stored.	There are strict table structures, primary keys, and field types.
Record	Because there is no strict table concept, the record is embodied as an object, and the corresponding class of the record is document.	Record, which corresponds to the table structure.
Field	The field type can only be text or date. The field generally does not support computation and has no function. The field class corresponds to the field class.	Rich Field Types and powerful functions.
Query Result set	In Lucene, the query result set class is hits, such as hits (doc1, doc2, doc3 ......)	Use resultset in JDBC

2. Comparison of fuzzy search

Item	Lucene full-text search	Fuzzy Database Query
Index	Create an inverted index for data in the data source, which is faster	The database index cannot be used and all records need to be traversed for fuzzy match. Therefore, the query speed is reduced by multiple orders of magnitude.
Matching Effect	Key features are split through the word meta matching and language analysis interfaces to support Chinese characters.	Because fuzzy query is performed and the matching is inaccurate, irrelevant information or missing information may be found.
Matching degree	There is a matching degree algorithm, and the results with a higher degree of matching are ranked first	No matching algorithm. The number of times a keyword appears in the record is the same.
Result output	A special algorithm is used to output the first 100 results with the highest matching degree. The result set is read in a buffer-type small batch, with low system overhead.	Returns all result sets. When there are many matching entries, a large amount of memory is required to store these temporary result sets, causing high system overhead.
Customization	You can use the API to customize the sorting rules that meet the search and sorting requirements.	Not customizable
Applicability	For Fuzzy search applications with high load, the indexing data volume is large, the speed requirement is fast, and the matching requirement is high.	Low usage, simple fuzzy matching rules, or a small amount of data to be queried

Index creation process

The process of creating an index can be divided into the process of converting the original document into text, analyzing text, and saving the analyzed text to the index.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Learning Lucene step by step -- (Step 3: indexing)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support