Lucene's file structure is a hierarchical structure.
The Lucene file hierarchy contains:
If you want to make an analogy:
The index here is equivalent to the table in the database, and when the record in the table reaches a certain amount, we will partition the table.
The section here is equivalent to the partition of the table.
In other words, the index can be decomposed into multiple segments.
The records in the data table are saved in separate partitions, where the documents are equivalent to the records in the database tables. Different documents are saved in different segments.
Slightly different from the partitioning of the table, the segments here can be combined into a new segment.
An index contains multiple segments, each of which contains one or more documents, and the document contains multiple fields that can be divided into one or more lexical elements.
Forward information and reverse information are saved in the Lucene index.
Forward information: the inclusion relationship of the index to the word is saved hierarchically.
Index-Segment-document-domain-word
An index contains which segments, which documents each segment contains, what fields are included in the document, and what the fields are divided into.
Inverse information: the mapping of word-to-document relationships.
In the previous article in this series (1), we define this mapping from Word to document as an inverted table. With the inverted table, we can see which documents each word appears in.
Lucene Note Series (3) file structure of--lucene