Lucene optimization measures and research inspiration

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

. 1 Index Process Optimization

Index Average score2One is small batch index expansion, and the other is large batch index reconstruction. During the indexing processDocWhen the index is added, the index file is written again (FileI/OIs a very resource-consuming thing ).

LucenePerform Indexing in the memory and write files in batches. The larger the interval for this batch, the less files are written, but the memory is occupied. On the contrary, the memory usage is small, but the fileIoFrequent operations and slow indexing. InIndexwriterThere isMerge_factorParameters can help you fully utilize the memory to reduce file operations based on the application environment. Based on my experience: DefaultIndexerYes20Each record index is written once.Merge_factorAdd50Times, index speed can be improved1Times or so.

4.4.2Search Process Optimization

LuceneSupports memory indexes: such searches are better than file-basedI/OThe speed is increased by an order of magnitude. MinimizeIndexsearcherIs also necessary to create and cache the search results.

LuceneThe Optimization for full-text search is that after the first index search, not all records (Document) The specific content is read, and only the headers with the highest matching degree in all results are retrieved.100Results (Topdocs)IDPut it in the result set cache and return it. here we can compare the database search: if it is10,000The database must retrieve all records before returning them to the application result set. So even if the total number of searches matches is large,LuceneThe result set does not occupy much memory space. For general fuzzy search applications, so many results are not used.100Items can already be met90%The above search requirements.

when the first batch of cached results are used up, you need to searcher retrieves and generates a cache with a maximum of 1 times, and then re-capture. Therefore, if you construct a searcher query 1 - 120 results, searcher times of searching: header 100 after the entries are retrieved, the cache results are used up. searcher re-search and construct a 200 result cache, and so on. 40 0 cache entries, 800 cache entries. Since each searcher object disappears, these caches cannot be accessed, you may want to cache the result records. The number of cache records should be 100 below, we will make full use of the first result cache to prevent Lucene from wasting multiple searches, in addition, results can be cached hierarchically.

4.5ResearchLuceneInspiration

LueneIs a model of object-oriented design. It is mainly manifested in the following aspects:

All problems can be easily extended and reused through an extra abstraction layer: you can achieve your goal through re-implementation, instead of requiring other modules;

Simple Application PortalSearcher, IndexerAnd call a series of underlying components to complete the search task collaboratively;

Tasks of all objects are very specific, such as the search process:QueryparserAnalysis converts a query statement into a combination of a series of precise queries(Query ),Read the structure through the underlying indexIndexreaderReads the index and scores the search result with the corresponding scorecard./Sort. All functional modules are very atomic, so you can implement them again without modifying other modules.

In addition to flexible application interface design,LuceneIt also provides some Language Analyzer implementations suitable for most applications (Simpleanalyser, standardanalyser), Which is one of the important reasons for new users to quickly get started. These advantages are worth learning from in future development. As a general tool kit,LuneceIt is indeed convenient for developers who need to embed the full-text search function into the application.

In addition, through learning and using Lucene , I also deeply understand why many database optimization design requirements are required, for example, if you want to index fields to improve the query speed, too many indexes will update the database table operations are slow, and sorting conditions with too many results are often the performance killer. Many commercial databases provide some optimization parameters for large volumes of data insertion operations, merge_factor The query results are similar. The query results are not of good quality. Especially for a large returned result set, how to optimize the quality of the first few results is always the most important; try to get a small result set from the database, because even for large databases, random access to the result set is a resource-consuming operation.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Lucene optimization measures and research inspiration

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Lucene optimization measures and research inspiration

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support