Original: Lucene.Net 2.3.1 Development Introduction--three, index (ii)
2, the core class used in the index
In Lucene.Net index development, there are not many classes used, which are the core classes of the indexing process. Where analyzer is the basis for indexing, and directory is a medium for indexing or storage, document and field classes are the core of the logical structure, and IndexWriter is the core of the operation. The use of other classes has been hidden, which is why lucene.net is so convenient to use.
2.1 Analyzer
Analyzer has been explained in detail in the previous section, and Analyzer will describe a piece of text as a token. How these tokens are used by indexwriter involves a very important class, and that is documentswriter. This class is very critical and can be said to be the most core class in the index section, and IndexWriter is just a wrapper for it. Here the main introduction of the application, so do not do too detailed introduction. Token in the Documentswriter class,--invertfield--is pushed into field by Documentswriter's most important method. This completes the process of adding a word breaker to the logical structure.
2.2 Directory
Strictly speaking,Directory does not belong to the index, it represents the Lucene.Net storage medium, which indicates where the index is stored specifically. It doesn't seem to be used in the previous two examples, because the path you passed in is automatically converted to Directory. The Directory has two subclasses, each of which ramdirectory--represents the index in memory, and the fsdirectory--represents the index to the hard disk. The ramdirectory is still called during the use of fsdirectory to the hard disk. IndexWriter will put the established index first to Ramdirectory, and then to a certain condition before the data is written to the hard disk.
2.3 IndexWriter
IndexWriter is the core of the operation in the index, which is responsible for writing the index file to the storage medium, which is the link to control the logical storage transformation into physical storage.
IndexWriter has a total of 10 constructors that you can use, but they have fewer parameter types. There are altogether the following types:
(1), Directory D;
(2), Analyzer A;
(3), bool create;
(4), FileInfo path;
(5), string path;
(6), BOOL autocommit;
(7), Indexdeletionpolicy Deletionpolicy;
6 of them, 7 are not used. The FileInfo path and the string path will eventually be constructed into directory, and since both paths are disk paths, the constructed directory must be fsdrectory. BOOL Create indicates whether it is created, otherwise an incremental update, and the default state is false. BOOL Autocommit is not commonly used to specify whether an index is updated when it is in the close state, and if it is false, it needs to be updated in the close state. Indexdeletionpolicy deletionpolicy Specifies whether to remove a previous update, which can be represented as two values, Keeponlylastcommitdeletionpolicy and Snapshotdeletionpolicy, by default, are keeponlylastcommitdeletionpolicy.
2.4 Document
Document is a virtual record that can be understood as a row in the data. It is with it that we can operate the index file conveniently and easily and in an understandable way. It generally records the properties of a document that needs to be used, which, of course, needs to be used in conjunction with field.
2.5 Field
The field class is a column in the database. If a document has four attributes such as title, content, author, creation time, then you need four field to save these attributes, and then add four field to document, you have a row of records. In the query, regardless of the column, always get a whole row of records, is not similar to the database?
The Field itself has some attributes, just like the columns in the database. Its properties are set by its three inline classes, but this place can be fully enumerated, but unfortunately there is no enumeration in Java, so porting is not converted to enumerations.
Field constructors are also more, with more than 7. Where Store,index and Termvector are specified by the inner class.
(1), store has three options, Field.Store.COMPRESS means compressed storage; Field.Store.YES means storage; Field.Store.NO means not being stored.
(2), the index has four options, Field.Index.NO means no index, Field.Index.TOKENIZED is the index after the word breaker, index.no_norms represents the value store content; Field.Index.UN_ Tokenized represents a non-segmentation index.
(3), termvector This parameter is also not used, it has five options. Field.TermVector.NO indicates the location attribute of the token is not indexed; Field.TermVector.WITH_OFFSETS represents the end point of the additional index token; Field.TermVector.WITH_ Positions represents the current position of the extra index token; Field.TermVector.WITH_POSITIONS_OFFSETS indicates the current and end position of the additional index token The Field.TermVector.YES represents the storage vector.
2.6 Index Core class workflow Process
Figure 2.6.1
2.6.1 represents the entire process of data processing in the Lucene.Net index process. Note that in this flowchart, the word breaker does not directly produce a Field object, in which case the analyzer is given an instance of IndexWriter, and so on when the add document operation is performed, IndexWriter will actually invoke the word breaker to generate the data required by the field (in the Documentwriter Class). It just reflects how the data flows, not the actual calling process.
Lucene.Net 2.3.1 Development Introduction--three, index (ii)