Improvements to ipve3.x

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Document directory

1. index file Improvement
2. re-search
3. Digital Processing
4. Other Optimizations
1. Create an index
2. Query

December 10, April 20-hikrock

I. Overview

Lucene3.0 (hereinafter referred to as 3.0) was released on. Version 3.0 is a major version with great changes. I have made a lot of adjustments on the API, deleted many discarded methods and classes, and supported many new features of Java 5: including generics, variable parameters, enumeration, and autoboxing.

Therefore, this version is incompatible with version 2.x. To use version 3.0, it is best to use it in a new project instead of upgrading version 2.x or earlier!

Ii. Version 2.9

Because the new version has changed a lot, it is not recommended to upgrade from the old version to the new version. Because the changes will be large.
In fact, in version 2.9, the changes are great, because version 2.9 is prepared for version 3.0, but in order to be backward compatible, version 2.9 does not abandon the old method, so it can be directly backward compatible. Version 2.9 is mainly used to optimize the performance, including the internal structure improvement and index management methods of Lucene on the underlying layer.

1. index file Improvement

Lucene's index data is stored in independent files, which are the "fragments" that store column separation in the index database ". When we want to add documents to the index, we will constantly create new fragments that can be merged, because the overhead of reading and writing files is large, therefore, Lucene does not directly add the field information to the index file every time. Instead, it caches the field information and writes it to the file again after a certain amount. After 2.9, Lucene manages fieldcache for each clip to avoid loading fieldcatch across fragments. This solves the problem of inefficient loading fieldcatch across fragments in Lucene, this change greatly improves the performance. Lucid
Mark Miller of imagination runs a simple performance test, indicating that Lucene will get a performance improvement of about 15 times over version 5,000,000 in the case of 2.4 different strings: Lucene 2.4: 150.726 s Lucene 2.9: 9.695 s

2. re-search

The new version introduces the indexwriter. getreader () method, which can be used to search for the current complete index, including the changes that have not been submitted in the current indexwriter session, which brings close to the real-time search capability. In addition, you can call indexwriter. setmergedsegmentwarmer () to "push" the fragments so that they can be immediately put into use.

3. Digital Processing

Versions earlier than version 2.9 are based on text search, because it is a headache for processing many numbers, for example, many problems encountered in our project are caused by the bug that numbers are treated as text: 1. Search price 5. include. 5 is also found; 2. When sorting (descending), 800 is ranked before 5000 ;...... These are all problems caused by Lucene's use of all of them as text processing methods. Lucene 2.9 and later have provided the processing method for numbers. The field and query classes index and search with appropriate precision, which greatly reduces the number of keywords to be searched and significantly improves the query response capability.

4. Other Optimizations

A new query type and a wider range of keyword queries (wildcard, prefix, and so on) are introduced, as well as a new analyzer for Persian, Arabic, and Chinese. In addition, this update also includes better Unicode support, a new query and analysis framework, and query of geographical locations, it allows filtering and sorting documents based on distance information (for example, "finding all supermarkets in my home 5 km ").

Iii. Comparison between version 2.9 and version 3.0

Although 2.9 is a version prepared for 3.0, compared with 3.0, 2.9 has a relatively large change, which should be reflected in:

1. 3.0 abandoned the method declared in 2.9, so 3.0 is not backward compatible;
2. 3.0 gave up support for java1.4 and changed to support for later versions of java1.5 and ant 1.7.0;
3. Some other kernel changes, such as oallock. islocked (), which throws an ioexception and changes to some static variables.

Iv. change of the main method in 3.0

Here we will talk about the differences between creating indexes and searching in the latest version.

1. Create an index

The new version discards many unused methods when creating indexes. For details, see that all indexwriter constructors declared to be abandoned are deleted in 3.0.

Indexwriter constructor 3.0:

When an index is added, the constant of each field also changes, as shown in the following code:

2. Query

Queryparser (F String, analyzer parser = new queryparser (version. lucene_current, field, new standardanalyzer (version. lucene_current); query = parser. parse (Q); topscoredoccollector collect topscoredoccollector. create (100); indexsearcher is = new indexsearcher (fsdirectory. (Open file), true ). Is. Search (collector );
Scoredoc [] file = collector. topdocs () scoredocs (I = 0; <docs. length; I ++) {file Doc = DOC (document [Me]. Doc); // new is.doc () system. Out. println (Doc. getfield ("name") + "" + document [I]. Tostring () + ");}

[/Code]

3.0 search constructor:

Constructor before 3.0:

V. 3.0 overall Diagram

Compared with the previous version (before 3.0), the structure of version 2.9 only shows an additional message package in the program structure, which is used to handle internationalization.

As you can see, 3.0 is the same as the previous version. It consists of eight modules (package), which are encapsulated by external interfaces, index core, and infrastructure. For details, see Appendix 1. We can also see the call relationship during Lucene search: when we want to query a word, the query module (Search) will first call the syntax analyzer (queryparser) to analyze the query statement, the syntax analysis module calls the lexical analyzer (analysis) for lexical analysis, such as Word Segmentation and filtering for search keywords. the lexical analyzer calls the message module according to the actual situation) for some international processing. After these preparations are completed, the search core is truly entered. The index module (INDEX) is called to read the data in the index file from the underlying storage class (store, then return to the query module. Other modules exist as public classes throughout the search process.

Address: http://www.ourys.com/post/lucene3-0_about.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Improvements to ipve3.x

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Improvements to ipve3.x

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support