Lucene-based case development: implementing Real-time indexing fundamentals

Last Update:2015-02-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reprint Please specify source: http://blog.csdn.net/xiaojimanman/article/details/43982653

Http://www.llwjy.com/blogd.php?id=63d4c488a2cccb5851c0498d374951c9

Personal blog Station also build success, website: www.llwjy.com/blog.php, Welcome to spit Groove ~

Basic principle

In the previous blog also said that the program initial session index file is very consumption of system resources, so in order to achieve real-time index can not be real-time to modify the index file, reload the index file, you must consider how to use memory to implement this real-time index In the Lucene4.3.1 version (which is also available in the previous version, but the nrt* related classes are deleted in later versions), the nrt* related classes provide a method for creating a real-time index (pseudo-real-time index). The related operation of Indexwrite is delegated to Trackingindexwriter, which realizes the combination of memory index and hard disk index, and provides the available index externally through Nrtmanager, of course, Before executing a commit (described in the previous create index) operation, the data of the operation is in memory, once the outage or the service is heavy, the data will be lost, so you need to add a daemon thread to continue to perform commit operations (commit operation consumes system resources, It is not possible for an index to perform this operation every time it is modified. The following is a few simple graphs to introduce the implementation of real-time indexing principle:

At the start of the system, there are two indexes: Memory index, hard disk index, of course, there is no data in the memory index, the structure as shown:

In the process of running the system, once the index is incremented, deleted, modified, and so on, these operations are operations Memory index, not the hard disk index, as shown in the following:

When the program actively performs a commit operation, this is a copy of the memory index, which we call the merge index, and the memory index is emptied for the subsequent index operation, there is a memory index, merge index, hard disk index in the system, but also the data in the merged index will be written to the hard disk. As shown in the following:

When the data in the merged index has all been written to the hard disk, the program will reread the hard disk index, form a new indexreader, and delete the indexreader of the merged index when the new hard disk Indexreader replaces the old hard disk Indexreader. The system then goes back to its original state (of course, there may be data in the memory index), as shown in:

So again, a real-time index of the system is completed, of course, there will be a certain risk, that is, in the outage may lose a portion of the data. On this issue, if the accuracy of the data is not too high to ignore, after all, the probability of this situation is too small, if the accuracy of the data requirements are particularly high, you can add output log to complete.

Ps:lucene internal logic more complex than the above, here is just a brief introduction of the implementation of the principle, such as in-depth understanding, also please read the relevant books, source code.

Configuration class

In this blog, we will first introduce this series of real-time index of the configuration class described below, the following is no longer introduced.

Configbean

In the Configbean class, the basic properties of some indexes are defined, such as: index name, hard disk storage location, used word breaker, commit operation frequency, memory index reread frequency, etc., the code is as follows:

/** * @Description: Index base configuration Properties */package Com.lulei.lucene.index.model;import Org.apache.lucene.analysis.Analyzer; Import Org.apache.lucene.analysis.standard.standardanalyzer;import Org.apache.lucene.util.version;public class Configbean {//Word breaker private Analyzer Analyzer = new StandardAnalyzer (version.lucene_43);//index address private String Indexpath = "/ index/";p rivate double indexreopenmaxstalesec = 10;private double indexreopenminstalesec = 0.025;//Index Commit time private int indexcommitseconds = 60;//index name private String indexname = "index";//commit output related Information private Boolean bprint = True;publi C Analyzer Getanalyzer () {return Analyzer;} public void Setanalyzer (Analyzer analyzer) {This.analyzer = Analyzer;} Public String Getindexpath () {return indexpath;} public void Setindexpath (String indexpath) {if (!) ( Indexpath.endswith ("\ \") | | Indexpath.endswith ("/"))) {Indexpath + = "/";} This.indexpath = Indexpath;} Public double getindexreopenmaxstalesec () {return indexreopenmaxstalesec;} public void SetindexreopenmaXstalesec (double indexreopenmaxstalesec) {this.indexreopenmaxstalesec = indexreopenmaxstalesec;} Public double getindexreopenminstalesec () {return indexreopenminstalesec;} public void Setindexreopenminstalesec (double indexreopenminstalesec) {this.indexreopenminstalesec = Indexreopenminstalesec;} public int getindexcommitseconds () {return indexcommitseconds;} public void setindexcommitseconds (int indexcommitseconds) {this.indexcommitseconds = Indexcommitseconds;} Public String Getindexname () {return indexname;} public void Setindexname (String indexname) {this.indexname = IndexName;} public Boolean isbprint () {return bprint;} public void Setbprint (Boolean bprint) {this.bprint = Bprint;}}

Indexconfig

In a system does not necessarily exist only one index, may also be multiple, so added a Indexconfig class, the specific code is as follows:

/**   * @Description: Index Related configuration Parameters */package Com.lulei.lucene.index.model;  Import Java.util.hashset;public class Indexconfig {//config parameter private static hashset<configbean> Configbean = null;// Default configuration private static class Lazyloadindexconfig {private static final hashset<configbean> Configbeandefault = new has Hset<configbean> (); static {Configbean Configbean = new Configbean (); Configbeandefault.add (Configbean);}} public static hashset<configbean> Getconfigbean () {//If Indexconfig is not initialized, use the default configuration if (Configbean = = null) { Configbean = Lazyloadindexconfig.configbeandefault;} return Configbean;} public static void Setconfigbean (Hashset<configbean> configbean) {Indexconfig.configbean = Configbean;}}

PS: Recently found other sites may be reproduced on the blog, there is no source link, if you want to see more about Lucene-based case development please click here. Or visit the URL http://blog.csdn.net/xiaojimanman/article/category/2841877 or http://www.llwjy.com/blog.php

Lucene-based case development: implementing Real-time indexing fundamentals

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Lucene-based case development: implementing Real-time indexing fundamentals

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Lucene-based case development: implementing Real-time indexing fundamentals

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support