Lucene6.6.0 Cases and Learning routes

Source: Internet
Author: User

The foreman asked me to learn Lucene, the full-text Search tool, which lays the foundation for the development of the project search engine. It took a week to familiarize yourself with the use of the basic Lucene API and to write a few examples where you can first share your learning experience with Lucene.

1. First, it is recommended to look at this streamlined blog post for an understanding of Lucene's index creation and retrieval capabilities. https://www.ibm.com/developerworks/cn/java/j-lo-lucene1/need to note that this blog's jar package is Lucene2.0.0 older.

2. Then read another more detailed blog post: http://blog.csdn.net/csh624366188/article/details/ 6823209. This blog post reads note that the previous part of the demo Lucene Search application is Lucene3.0.3, and most of the following cases are Lucene2.0.0.

3. English level also go to the official website of the Apache can go to read a document, and then look at the core of the api:https://lucene.apache.org/core/6_6_0/index.html note this is 6.6.0. I put two Lucene6.6.0 cases on GitHub, address: https://github.com/Jethu1/lucene6.git.

Fundamentals and Main packages:

Once you have indexed your documents, you can search for them on those indexes. Search engines will first analyze the keywords that are searched, and then find them on the established index, and eventually return the documents associated with the keywords entered by the user.

Lucene Package Analysis

The format of the Lucene package is a jar file, let's analyze the main JAVA package inside the jar file, so that the reader has a preliminary understanding of it.

Package:org.apache.lucene.document

This package provides some of the classes needed to encapsulate documents to be indexed, such as document, Field. In this way, each document is eventually encapsulated as a paper object.

Package:org.apache.lucene.analysis

The main function of this package is to segment the document, because the document must be preceded by a word breaker before indexing, so the function of this package can be considered as preparation for indexing.

Package:org.apache.lucene.index

This package provides classes to assist in creating indexes and updating the created indexes. There are two basic classes: IndexWriter and Indexreader, where IndexWriter is used to create an index and add a document to the index, and Indexreader is used to delete the document in the index.

Package:org.apache.lucene.search

This package provides the classes needed to search on a well-established index. For example, Indexsearcher and Hits, Indexsearcher defines the method of searching on the specified index, Hits is used to save the results of the search. (The Hits object in lucene6.x has been changed to Scoredoc ).

Build an index

To index a document, Lucene provides five basic classes, namely document, Field, IndexWriter, Analyzer, and Directory. Let's take a look at the purpose of these five classes separately:

Document

Document is used to describe a document, which can refer to an HTML page, an e-mail message, or a text file. A Document object is made up of multiple Field objects. You can think of a document object as a record in a database, and each Field object is one of the fields of the record.

Field

A Field object is used to describe a property of a document, such as the title and content of an e-mail message, which can be described by two Field objects respectively.

Analyzer

Before a document is indexed, the first thing you need to do is word processing of the document content, which is done by Analyzer. The Analyzer class is an abstract class that has multiple implementations. Choose the right Analyzer for different languages and applications. Analyzer gives IndexWriter the content after the word breaker to index.

IndexWriter

IndexWriter is a core class used by Lucene to create an index, and his role is to add one Document object to the index.

Directory

This class represents the location where Lucene's index is stored, an abstract class that currently has two implementations, the first of which is Fsdirectory, which represents the location of an index stored in the file system. The second is Ramdirectory, which represents the location of an index stored in memory.

Search for documents

Using Lucene to search is just as easy as building an index. In the above section, we have indexed the text document in a directory, and now we are going to search the index to find the document that contains a keyword or phrase. Lucene provides several basic classes to complete the process, namely Indexsearcher, term, Query, Termquery, Hits. Here we describe the functions of each of these classes.

Query

This is an abstract class, he has multiple implementations, such as Termquery, Booleanquery, Prefixquery. The purpose of this class is to encapsulate the query string entered by the user into the queries that Lucene can recognize.

Term

Term is the basic unit of search, and a term object consists of two fields of type String. Generating a term object can be done with one of the following statements: term term = new term ("FieldName", "Queryword"); The first parameter represents the Field on which the document is to be searched, and the second parameter represents the keyword to be queried.

Termquery

Termquery is a subclass of abstract class query, which is also the most basic query class supported by Lucene. Generating a Termquery object is completed by the following statement: Termquery termquery = new Termquery (New term ("FieldName", "Queryword"); Its constructor accepts only one parameter, which is a term object.

Indexsearcher

Indexsearcher is used to search on a well-established index. It can only open an index in a read-only manner, so multiple instances of Indexsearcher may be manipulated on an index.

Hits

Hits is used to save the results of a search. (The Hits object has been changed to Scoredoc to save the object after lucene6.x).

Lucene6.6.0 Cases and Learning routes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.