Search Engine Series Two: Lucene (lucene Introduction, Lucene architecture, Lucene integration)

Source: Internet
Author: User

I. Introduction of Lucene

1. About Lucene

The most popular open source full-Text search engine Development toolkit for Java . Provides a complete query engine and indexing engine, partial text word breaker (English and German two Western languages). Lucene's goal is to provide software developers with an easy-to-use toolkit to facilitate full-text retrieval in the target system, or to build a complete full-text search engine on this basis. is Apache sub-project, URL: http://lucene.apache.org/

2. Lucene uses

Provide software developers with an easy-to-use toolkit to facilitate full-text indexing in target systems, or to build a complete full-text search engine on this basis.

3. Lucene Application Scenario

Provides full-text retrieval implementations for data in the database in your app.

Development of independent search engine services, systems

4. Characteristics of Lucene

1, stable, high index performance

Can index more than 150GB of data per hour.

Low memory requirements--only 1MB of heap memory is required

Incremental indexes are as fast as bulk indexes.

The size of the index is approximately 20%~30% of the index text size.

2. Efficient, accurate and high-performance search algorithm

Good sort of search.

Powerful Query method support: Phrase query, wildcard query, proximity query, scope query, and so on.

Support for field searches (such as title, author, content).

Can be sorted by any field

Supports multiple indexed query results merging

Support for update operations and query operations at the same time

Support highlighting, join, grouping result functions

Fast speed

Extensible sorting module with built-in vector space model, BM25 model optional

Configurable storage Engine

3. Cross-platform

Written in pure java.

As an open source project under the Apache Open Source license, you can use it in commercial or open source projects.

Lucene is available in multiple languages (e.g. C, C + +, Python, etc.), not just java.

Second, Lucene architecture

1. Data collection

2. Create an index

3. Index Storage

4. Search (using index)

Three, Lucene integration

1. Selected Lucene version

Select the current latest version of 7.3.0:https://lucene.apache.org/

2. System Requirements

Version JDK1.8 and above

3. Integration: Bring Lucene core jars into your application

Way one: Download zip, unzip and copy jar to your project

Way two: Maven introduces dependency

4. Lucene Module Description

Core:lucene Core Library module: participle, index, query

analyzers-*: Word breaker

facet:faceted indexing and search capabilities provides categorical indexes, search capabilities

Grouping:collectors for grouping search results. Search Results Grouping support

Highlighter:highlights search keywords in results keyword highlighting support

Join:index-time and Query-time joins for normalized content connection support

Queries:filters and queries that add to core Lucene supplemental query, filtering method implementation

Queryparser:query parsers and parsing framework query expression parsing module

Spatial:geospatial Search geolocation supports suggest:auto-suggest and spellchecking support spell checking, Lenovo hints

5. First introduce the core module of Lucene

<!--Lucene Core Module -<Dependency>    <groupId>Org.apache.lucene</groupId>    <Artifactid>Lucene-core</Artifactid>    <version>7.3.0</version></Dependency>

6. Understanding the composition of the core module

Search Engine Series Two: Lucene (lucene Introduction, Lucene architecture, Lucene integration)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.