Improvement of Apache Lucene 2.9

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Most of Lucene 2.9 focuses on performance optimization. This is reflected in the improvements from low-end internal infrastructure to index management methods. Lucene's index database is composed of a series of separated "fragments", each of which is stored in an independent file. When you add documents to an index, you will create new parts that can be merged. Lucene caches field information in fieldcache. It does not overhead loading field caching in Lucene 2.4 or earlier versions, especially in version 2.4, the entire field cache is constantly reloaded. During preparation for the release of version 2.9, The Lucene team realized that the change frequency of fragments during merging or deletion is usually relatively high, but earlier fragments tend to remain unchanged. Therefore, the modified field cache only loads updated parts.

The efficiency of loading fieldcache across Lucene fragments is not good. Therefore, version 2.9 manages fieldcache for each clip separately to avoid loading fieldcache across fragments. The effect of this change is very obvious. Mark Miller of lucid imagination runs a simple performance test, indicating that in the case of 5,000,000 different strings, compared with version 2.9, Lucene 2.4 delivers a performance improvement of about 15 times:
Lucene 2.4: 150.726 s
Lucene 2.9: 9.695 s

Another notable performance improvement lies in re-opening search. Lucene 2.9 introduces a new indexwriter. the getreader () method can be used to search for the current complete index, including the changes that have not been submitted in the current indexwriter session, which brings near real-time search capabilities. In addition, you can call indexwriter. setmergedsegmentwarmer () to "push" the fragments so that they can be immediately put into use.

Another major change is the way numbers are processed, especially in a Range Query (for example, "show me a CD with a price between 0.5 and 9.99. Prior to version 2.9, Lucene's query was completely text-based, so the processing of numbers became string-based Precise encoding. This method often generates a large number of independent keywords. Lucene needs to traverse to build the entire result set. Previously, many developers used custom encoding rules to deal with this situation, but Lucene 2.9 already comes with a method for dealing with numbers. The field and query classes index and search with appropriate precision, which greatly reduces the number of keywords to be searched and significantly improves the query response capability.

Version 2.9 also introduces a new query type and a wider range of keyword queries (wildcard, prefix, etc.), as well as a new analyzer for Persian, Arabic, and Chinese. In addition, this update also includes better Unicode support, a new query and analysis framework, and query of geographical locations, it allows filtering and sorting documents based on distance information (for example, "finding all dry cleaners in my home 5 miles "). You can find the complete improvement list here.

In general, beie will maintain the compatibility of the primary node, but the "backward compatibility policy" section of changes.txt lists the compatibility damages caused by Lucene 2.9 in many places. For version 2.9, the upgrade operation may require a re-compilation, suitable for a complete regression test and other efforts in this regard. Version 2.9-based re-compilation will also prompt all methods to be discarded, so that developers can upgrade their applicationsProgramAnd prepare for version 3.0. This is a wise practice, because Lucene 3.0 will discard support for Java 1.4 and delete all features marked as "deprecated" in version 2.9.

Apache Lucene 2.9 released

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Improvement of Apache Lucene 2.9

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Improvement of Apache Lucene 2.9

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support