In Lucene, are there several index storage modes? Used people may remember simplefsdirectory, Mmapdirectory, Niofsdirectory, ramdirectory these four kinds. The new version of the Fsdirectory.open can automatically get the best one:
public static Fsdirectory open (File path, Lockfactory lockfactory) throws IOException { if (constants.windows | | Constants.sun_os) && constants.jre_is_64bit && mmapdirectory.unmap_supported) { return new Mmapdirectory (path, lockfactory); } else if (constants.windows) { return new simplefsdirectory (path, lockfactory); } else { return new Niofsdirectory (path, lockfactory); } }
With this code, we can see what kind of conditions are best for each directory.
Mmapdirectory Memory-mapped index mode, part of the memory, part of the disk, but requires operating system support, preferably 64-bit system and 64-bit JVM, so that memory can reach the maximum application. Linux and Windows are basically available.
Simplefsdirectory simple disk storage, according to the above method conditions, when the mmapdirectory can not be used, if it is the Windows operating system, the recommended disk storage mode. In this mode, there is a lot of disk IO, so index creation and retrieval is heavily dependent on disk performance.
Niofsdirectory uses the NiO method to read and write indexes. This condition is even more spoof, before it first judges windows. means that under Windows it thinks this is not optimal. Main reason: There is a bug in Java NiO under Windows.
Ramdirectory This method of memory storage is not reflected in this way. It is primarily used to store non-persistent indexes, which means that the program is closed and the index is lost.
Nrtcachingdirectory This is a memory plus disk storage mode, mainly used in real-time search of the scene, this is in the high version of Lucene. The default index storage directory in SOLR is it.
The above general introduction of the various index storage, add a sentence: The above various under the existing operating system can be used, but various have their own use of the scene or have their own shortcomings.
After describing the index directory of Lucene, let's say that SOLR's index is stored in the directory. SOLR is based on Lucene encapsulation, that is Lucene has, SOLR also has, but SOLR also has its own package, I mainly introduce the following hdfsdirectory and Blockdirectory:
Hdfsdirectory the index to HDFS, its usage scenario is that the index is huge. It is generally not big data, and it does not put data on HDFs.
Blockdirectory According to the name we can know that it divides the index into chunks, in a concept of distributed storage, where all data is stored as blocks. It is a product of SOLR 4.x and may be replaced in subsequent versions. Not many of the usual use.
About Lucene and SOLR indexed storage directory