Lucene. Net 2.3.1 Development Introduction-III. Index (2)

Source: Internet
Author: User

2. Core classes used in Indexes

There are not many classes used in Lucene. Net Index development. These classes are the core classes of the index process. Analyzer is the basis for indexing, directory is the medium for indexing or storage, document and field classes are the core of the logical structure, and indexwriter is the core of operations. The usage of other classes is hidden, Which is why Lucene. NET is so convenient.

 

2.1 Analyzer

Analyzer has been explained in detail earlier. analyzer will name a text analysis as a token. How to use these tokens by indexwriter involves a very important class, that is, documentswriter. This class is very important. It can be said that it is the core class of the index, and indexwriter is just a package of it. Here we will mainly introduce the application, so we will not introduce it in detail. In the documentswriter class, the token is pushed to the field through the most important method of documentswriter-invertfield. This completes the process of adding word segmentation to the logical structure.

 

2.2Directory

Strictly speaking,DirectoryIt is not proprietary to indexes. It represents the storage medium of Lucene. net. It represents the specific storage location of indexes. It does not appear to be used in the first two examples, because the path you passed in will be automatically convertedDirectory.DirectoryThere are two sub-classes, ramdirectory-indicates that the index is stored in the memory, and fsdirectory-indicates that the index is stored in the hard disk. Ramdirectory is still called when fsdirectory is used to store data on the hard disk. Indexwriter first places the created index in ramdirectory and then writes the data to the hard disk only when certain conditions are met.

 

2.3 indexwriter

Indexwriter is the core of indexing operations. It writes index files to storage media and is the link to control the conversion of logical storage into physical storage.

There are 10 constructors available for indexwriter, but their parameter types are relatively small. There are several types:

 

(1) Directory D;

(2) analyzer;

(3) bool create;

(4), fileinfo path;

(5), string path;

(6) bool autocommit;

(7) indexdeletionpolicy deletionpolicy;

 

6 and 7 are not commonly used. Fileinfo path and string path are eventually constructed into directory. Because both paths are disk paths, the constructed directory must be fsdrectory. Bool create indicates whether it is created. Otherwise, it is incremental update. The default status is false. Bool autocommit is not commonly used. It is used to specify whether an index is updated only when it is in the close state. If it is false, it must be updated in the close state. Indexdeletionpolicy deletionpolicy specifies whether to remove previous updates. It can be expressed as two values: keeponlylastcommitdeletionpolicy and snapshotdeletionpolicy. The default state is keeponlylastcommitdeletionpolicy.

 

 

2.4 document

Document is a virtual record, which can be understood as a row of data. It is with it that we can easily and easily operate index files. It generally records the attributes of a document to be used. Of course, this must be used together with field.

 

Field 2.5

The field class is a column in the database. If a document has four attributes: title, content, author, and creation time, you need four fields to save these attributes, and then add the four fields to the document, with a row of records. During the query, no matter which column is queried, a whole row of records can always be obtained. Is it very similar to the database?

FieldIt has some attributes, just like columns in the database. Its Attributes are set through its three embedded classes. In fact, enumeration can be used in this place, but unfortunately there is no enumeration in Java, so it is not converted to enumeration after transplantation.

The number of field constructors is also large, with as many as seven. Store, index, and termvector are specified through internal classes.

(1) store has three options: field. Store. Compress indicates Compressed Storage; field. Store. Yes indicates storage; field. Store. No indicates not storage.

(2) There are four options for index, field. index. no indicates no index is created; field. index. tokenized indicates the index after word segmentation; index. no_norms indicates the value storage content; field. index. un_tokenized indicates that the index is not segmented.

(3) termvector is not commonly used. It has five options. Field. termvector. no indicates that the location attribute of the token is not indexed; field. termvector. with_offsets indicates the end point of the additional index token; field. termvector. with_positions indicates the current location of the additional index token; field. termvector. with_positions_offsets indicates the current and end positions of the extra index token; field. termvector. yes indicates the storage vector.
 

 

 

2.6 indexing core WorkflowCheng

 

 

Figure 2.6.1

 

 

2.6.1 indicates the entire process of data processing in the Lucene. Net Index process. Note: In this flowchart, the analyzer does not directly generate field objects. In the instance, analyzer is assigned to the indexwriter instance. When adding a document, indexwriter calls the tokenizer to generate the data required by the field (in the documentwriter class ). It only reflects how data flows, not the actual call process.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.