The following describes how to create an index.
In fact, from the above example, we can see that document, indexwriter, and field are used to create an index.
The simplest step is:
First, create a new document, indexwriter, and field respectively.
Then add the field using doument. Add,
Second, useIndexwrtier. adddocument ()Add MethodDocument.
Finally, call indexwriter. the close () method disables the input index. This step is very important. Only the index of this method can be written into the index directory, which is ignored by many beginners.
Document does not have much to introduce. It can be regarded as a row of records in the database.
Field is important and complex:
Let's take a look at its constructor with five:
Field
(String name, byte[] value, Field.Store store)
Field
(String name, Reader reader)
Field
(String name, Reader reader, Field.TermVector termVector)
Field
(String name, String value, Field.Store store, Field.Index index)
Field
(String name, String value, Field.Store store, Field.Index index, Field.TermVector termVector)
There are three internal classes in field: field. index, field. Store, field. termvector, and they are also used by constructors.
Note:
termVector
Is Lucene 1.4.
It is not commonly used to provide a vector mechanism for Fuzzy queries. The default value is false, but it does not affect general queries.
Their different combinations play different roles in full-text search. Let's look at the following table:
Field. Index |
Field. Store |
Description |
TOKENIZED( Word Segmentation)
|
YES
|
The title or content of an article (if the content is not too long) can be searched. |
TOKENIZED
|
NO
|
The title or content of an article (the content can be very long) can also be viewed. |
NO
|
YES
|
This cannot be searched. It is only an attachment to the searched content. Such as URL |
UN_TOKENIZED
|
YES/NO
|
Not segmented. It is searched as a whole and cannot be searched. |
NO
|
NO
|
No such usage |
ForField
(String name, Reader reader)
Field
(String name, Reader reader, Field.TermVector termVector)
They are field. Index. tokenized and field. Store. No. This is why the content in the above example is null. Because it is indexed but not stored. If you want to see the content of the article, you can get it through the path of the Article. After all, the path of the article is searched out as an attachment to the search. In web development, we usually place big data in the database, not in the file system, or in the index directory, because the operation is too large, it will increase the burden on the server..
The following describes indexwriter:
It is an index writer, and its tasks are relatively simple:
1. Use adddocument () to add documents that are prepared to write the index
2. Call close () to write the index to the index directory.
Let's take a look at its constructor:
IndexWriter
(Directory d, Analyzer a, boolean create)
(Unfinished)