Lucene Index Merge Policy

Source: Internet
Author: User


In the case of index algorithm determination, the most affecting Lucene index speed has three parameters--indexwriter in Mergefactor, Maxmergedocs, Rambuffersizemb. These parameters are nothing more than controlling the internal and external Exchange and index merging frequency, so as to improve the index speed. Of course, the settings of these parameters have to be flexibly set according to the hardware conditions.

MaxmergedocsThis parameter determines the number of documents written to the memory index, after which the memory index is written to the hard disk, and a new index segment file is generated.
So this parameter is a memory buffer, which is generally the larger the index speed.
Maxbuffereddocs This parameter is disabled by default, because Lucene also uses another parameter (RAMBUFFERSIZEMB) to control the number of indexed documents in this bufffer.
In fact Maxbuffereddocs and RAMBUFFERSIZEMB these two parameters can be used together, when used together as long as there is a trigger condition to write to the hard disk, generate a new index segment file.

Rambuffersizemb
Controls the upper memory limit used for the buffer index document and writes to the hard disk if the number of indexed documents in buffer reaches that limit. Of course, the bigger the index, the faster the indexing.
When we are unsure about the size of the document, this parameter is quite useful and does not outofmemory error.

MergefactorThis parameter is used for sub-index (Segment) merging.
Lucene index In general this is done, the index is now written to memory, triggering a certain limit to write to the hard disk, generate a separate sub-index-lucene called segment. In general, these sub-indexes need to be merged into one index, that is, optimize (), otherwise it will affect the retrieval speed, and may also cause open too many files.
Mergefactor This parameter is to control how many sub-indexes are in the hard disk segments, we need to now merge these indexes into a slightly larger index.
Mergefactor This can not be set too large, especially when Maxbuffereddocs compare hours (segment more), otherwise it will cause open too many files error, and even cause errors outside the virtual machine.

The default index merging mechanism in Note:lucene is not a 22 merge, it is multiple segment merged into the final large index, so the larger the mergefactor consumes more memory, the index speed will be faster, but I feel too big for example 300, the final merger is still very full. Batch indexing should be mergefactor>10

Setusecompoundfile (True)

When you create an index library, multiple Segment files are merged into one. CFS, which helps reduce the number of indexed files and reduces the number of files that are open at the same time.

You can use Jprofiler to detect the process activity of Lucene, determine when the process is processed, and the processing time, so that there is a basis and targeted to lucence optimization!

Lucene Index Merge Policy

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.