Lucene in action first knowledge of Lucene

Source: Internet
Author: User
Tags solr

1.3 Search Program Components

Lucene provides the core modules of the search program: the index module and the class library of the search module.

SOLR is based on Lucene, providing richer UIs and APIs that can be deployed and used directly

is the basic framework for searching for programs. The middle black part is the function of Lucene, and it is also the core part of the search engine.

Search Engine Evaluation:

Meet basic Features: Search results are displayed correctly

Search Reply Time

Extended function: syntax correction, keyword highlighting, etc.

1.3.1 Index Component

Search Engine principle:

Simple thought: Sequential search

Problem: too slow

Workaround: Index text content and return results by index

1. Get content:

Web content: Crawler tools

File system-specific directory, database content: Easy access

Content dispersion (file system, LAN content, etc.): difficult to obtain a

Rights Management System: more complex, need to get root permissions, get a list of permissions, implementation of search permissions control

Content acquisition requirements Run incrementally: can be updated in real time

Lucene does not provide content acquisition and relies entirely on your own programs or third-party programs:

SOLR: Support for databases, XML, integrated Tika

Nutch: web crawler

Lily:solr+hadoop's Distributed Search system

2. Create a document:

Convert the contents of all formats (files, a record of a database, etc.) into Lucene-identified search engine documentation classes: document. Document mainly includes fields with values, such as title, text content, author author, and so on. You can customize the fields, and you can also use the semantic parser to extract body text and write to the new domain separately.

3. Document Analysis:

The document field values are parsed for indexing.

Mainly for word breakers and filters, such as uniform case, extract stem, participle and other operations.

4. Document index: Index The results of the analysis and add them to the index database

Inverted index

  

  

Lucene in action first knowledge of Lucene

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.