Reprint Please specify source: http://blog.csdn.net/xiaojimanman/article/details/43982653
Http://www.llwjy.com/blogd.php?id=63d4c488a2cccb5851c0498d374951c9
Personal blog Station also build success, website: www.llwjy.com/blog.php, Welcome to spit Groove ~
Basic principle
In the previous blog also said that the program initial session index file is very consumption of system resources, so in order to achieve real-time index can not be real-time to modify the index file, reload the index file, you must consider how to use memory to implement this real-time index In the Lucene4.3.1 version (which is also available in the previous version, but the nrt* related classes are deleted in later versions), the nrt* related classes provide a method for creating a real-time index (pseudo-real-time index). The related operation of Indexwrite is delegated to Trackingindexwriter, which realizes the combination of memory index and hard disk index, and provides the available index externally through Nrtmanager, of course, Before executing a commit (described in the previous create index) operation, the data of the operation is in memory, once the outage or the service is heavy, the data will be lost, so you need to add a daemon thread to continue to perform commit operations (commit operation consumes system resources, It is not possible for an index to perform this operation every time it is modified. The following is a few simple graphs to introduce the implementation of real-time indexing principle:
At the start of the system, there are two indexes: Memory index, hard disk index, of course, there is no data in the memory index, the structure as shown:
In the process of running the system, once the index is incremented, deleted, modified, and so on, these operations are operations Memory index, not the hard disk index, as shown in the following:
When the program actively performs a commit operation, this is a copy of the memory index, which we call the merge index, and the memory index is emptied for the subsequent index operation, there is a memory index, merge index, hard disk index in the system, but also the data in the merged index will be written to the hard disk. As shown in the following:
When the data in the merged index has all been written to the hard disk, the program will reread the hard disk index, form a new indexreader, and delete the indexreader of the merged index when the new hard disk Indexreader replaces the old hard disk Indexreader. The system then goes back to its original state (of course, there may be data in the memory index), as shown in:
So again, a real-time index of the system is completed, of course, there will be a certain risk, that is, in the outage may lose a portion of the data. On this issue, if the accuracy of the data is not too high to ignore, after all, the probability of this situation is too small, if the accuracy of the data requirements are particularly high, you can add output log to complete.
Ps:lucene internal logic more complex than the above, here is just a brief introduction of the implementation of the principle, such as in-depth understanding, also please read the relevant books, source code.
Configuration class
In this blog, we will first introduce this series of real-time index of the configuration class described below, the following is no longer introduced.
Configbean
In the Configbean class, the basic properties of some indexes are defined, such as: index name, hard disk storage location, used word breaker, commit operation frequency, memory index reread frequency, etc., the code is as follows:
/** * @Description: Index base configuration Properties */package Com.lulei.lucene.index.model;import Org.apache.lucene.analysis.Analyzer; Import Org.apache.lucene.analysis.standard.standardanalyzer;import Org.apache.lucene.util.version;public class Configbean {//Word breaker private Analyzer Analyzer = new StandardAnalyzer (version.lucene_43);//index address private String Indexpath = "/ index/";p rivate double indexreopenmaxstalesec = 10;private double indexreopenminstalesec = 0.025;//Index Commit time private int indexcommitseconds = 60;//index name private String indexname = "index";//commit output related Information private Boolean bprint = True;publi C Analyzer Getanalyzer () {return Analyzer;} public void Setanalyzer (Analyzer analyzer) {This.analyzer = Analyzer;} Public String Getindexpath () {return indexpath;} public void Setindexpath (String indexpath) {if (!) ( Indexpath.endswith ("\ \") | | Indexpath.endswith ("/"))) {Indexpath + = "/";} This.indexpath = Indexpath;} Public double getindexreopenmaxstalesec () {return indexreopenmaxstalesec;} public void SetindexreopenmaXstalesec (double indexreopenmaxstalesec) {this.indexreopenmaxstalesec = indexreopenmaxstalesec;} Public double getindexreopenminstalesec () {return indexreopenminstalesec;} public void Setindexreopenminstalesec (double indexreopenminstalesec) {this.indexreopenminstalesec = Indexreopenminstalesec;} public int getindexcommitseconds () {return indexcommitseconds;} public void setindexcommitseconds (int indexcommitseconds) {this.indexcommitseconds = Indexcommitseconds;} Public String Getindexname () {return indexname;} public void Setindexname (String indexname) {this.indexname = IndexName;} public Boolean isbprint () {return bprint;} public void Setbprint (Boolean bprint) {this.bprint = Bprint;}}
Indexconfig
In a system does not necessarily exist only one index, may also be multiple, so added a Indexconfig class, the specific code is as follows:
/** * @Description: Index Related configuration Parameters */package Com.lulei.lucene.index.model; Import Java.util.hashset;public class Indexconfig {//config parameter private static hashset<configbean> Configbean = null;// Default configuration private static class Lazyloadindexconfig {private static final hashset<configbean> Configbeandefault = new has Hset<configbean> (); static {Configbean Configbean = new Configbean (); Configbeandefault.add (Configbean);}} public static hashset<configbean> Getconfigbean () {//If Indexconfig is not initialized, use the default configuration if (Configbean = = null) { Configbean = Lazyloadindexconfig.configbeandefault;} return Configbean;} public static void Setconfigbean (Hashset<configbean> configbean) {Indexconfig.configbean = Configbean;}}
PS: Recently found other sites may be reproduced on the blog, there is no source link, if you want to see more about Lucene-based case development please click here. Or visit the URL http://blog.csdn.net/xiaojimanman/article/category/2841877 or http://www.llwjy.com/blog.php
Lucene-based case development: implementing Real-time indexing fundamentals