Lucene. Net 2.3.1 Development Introduction-III. Index (2)

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

2. Core classes used in Indexes

There are not many classes used in Lucene. Net Index development. These classes are the core classes of the index process. Analyzer is the basis for indexing, directory is the medium for indexing or storage, document and field classes are the core of the logical structure, and indexwriter is the core of operations. The usage of other classes is hidden, Which is why Lucene. NET is so convenient.

2.1 Analyzer

Analyzer has been explained in detail earlier. analyzer will name a text analysis as a token. How to use these tokens by indexwriter involves a very important class, that is, documentswriter. This class is very important. It can be said that it is the core class of the index, and indexwriter is just a package of it. Here we will mainly introduce the application, so we will not introduce it in detail. In the documentswriter class, the token is pushed to the field through the most important method of documentswriter-invertfield. This completes the process of adding word segmentation to the logical structure.

2.2Directory

Strictly speaking,DirectoryIt is not proprietary to indexes. It represents the storage medium of Lucene. net. It represents the specific storage location of indexes. It does not appear to be used in the first two examples, because the path you passed in will be automatically convertedDirectory.DirectoryThere are two sub-classes, ramdirectory-indicates that the index is stored in the memory, and fsdirectory-indicates that the index is stored in the hard disk. Ramdirectory is still called when fsdirectory is used to store data on the hard disk. Indexwriter first places the created index in ramdirectory and then writes the data to the hard disk only when certain conditions are met.

2.3 indexwriter

Indexwriter is the core of indexing operations. It writes index files to storage media and is the link to control the conversion of logical storage into physical storage.

There are 10 constructors available for indexwriter, but their parameter types are relatively small. There are several types:

(1) Directory D;

(2) analyzer;

(3) bool create;

(4), fileinfo path;

(5), string path;

(6) bool autocommit;

(7) indexdeletionpolicy deletionpolicy;

6 and 7 are not commonly used. Fileinfo path and string path are eventually constructed into directory. Because both paths are disk paths, the constructed directory must be fsdrectory. Bool create indicates whether it is created. Otherwise, it is incremental update. The default status is false. Bool autocommit is not commonly used. It is used to specify whether an index is updated only when it is in the close state. If it is false, it must be updated in the close state. Indexdeletionpolicy deletionpolicy specifies whether to remove previous updates. It can be expressed as two values: keeponlylastcommitdeletionpolicy and snapshotdeletionpolicy. The default state is keeponlylastcommitdeletionpolicy.

2.4 document

Document is a virtual record, which can be understood as a row of data. It is with it that we can easily and easily operate index files. It generally records the attributes of a document to be used. Of course, this must be used together with field.

Field 2.5

The field class is a column in the database. If a document has four attributes: title, content, author, and creation time, you need four fields to save these attributes, and then add the four fields to the document, with a row of records. During the query, no matter which column is queried, a whole row of records can always be obtained. Is it very similar to the database?

FieldIt has some attributes, just like columns in the database. Its Attributes are set through its three embedded classes. In fact, enumeration can be used in this place, but unfortunately there is no enumeration in Java, so it is not converted to enumeration after transplantation.

The number of field constructors is also large, with as many as seven. Store, index, and termvector are specified through internal classes.

(1) store has three options: field. Store. Compress indicates Compressed Storage; field. Store. Yes indicates storage; field. Store. No indicates not storage.

(2) There are four options for index, field. index. no indicates no index is created; field. index. tokenized indicates the index after word segmentation; index. no_norms indicates the value storage content; field. index. un_tokenized indicates that the index is not segmented.

(3) termvector is not commonly used. It has five options. Field. termvector. no indicates that the location attribute of the token is not indexed; field. termvector. with_offsets indicates the end point of the additional index token; field. termvector. with_positions indicates the current location of the additional index token; field. termvector. with_positions_offsets indicates the current and end positions of the extra index token; field. termvector. yes indicates the storage vector.

2.6 indexing core WorkflowCheng

Figure 2.6.1

2.6.1 indicates the entire process of data processing in the Lucene. Net Index process. Note: In this flowchart, the analyzer does not directly generate field objects. In the instance, analyzer is assigned to the indexwriter instance. When adding a document, indexwriter calls the tokenizer to generate the data required by the field (in the documentwriter class ). It only reflects how data flows, not the actual call process.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Lucene. Net 2.3.1 Development Introduction-III. Index (2)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Lucene. Net 2.3.1 Development Introduction-III. Index (2)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support