International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Web Develop

Lucene some concepts and the process of index building when building indexes

Last Update:2015-03-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The thing to do before searching for document content is to index documents from a variety of different sources (Web pages, databases, e-mails, etc.), and the process of indexing is to extract the content, normalize it (by modeling the content), and store it.

In the process of indexing there are a few basic concepts, according to my own understanding probably write:

Document:

Documents are used in indexing and searching, which is the basic unit of Index and search ( similar to records in relational database tables ), and if we index and search the content of the Web page, every page that crawls from the Internet will eventually be analyzed, Extract the meaningful part (such as the page title, URL, contains keywords, release date, etc.), the formation of a document stored up, in the search for 0 according to these content to match, found a matching document, and then from the document to find the required content, to restore

Domain (field):

A field is something that is really used in a document to match, a document is made up of one or more fields (a field in a Category relationship table, a record consists of multiple fields, each field has its type and corresponding value, and the Lucene document is made up of fields, each with its name, type, and value). For the field options in Lucene, refer to the previous article I wrote: Domain options in Lucene

Analyzer/Word Element (term):

The parser is also used for indexing and searching, and the parser is parsing the original document (or user input) into a single word (called a lexical Element), and the index of Lucene is the structure of an inverted index, which stores the mapping from the word element to the document. The original document is converted to a lexical element by the parser and then stored as an index for the relationship between the word element and the document, and the parser converts the user's input to a WORD element and then to the index to find the matching document.

Lucene's indexing process is divided into three main steps:

1. Convert documents from a variety of ways to text

2. Parsing text with a parser

3. Save the parsed text to the index

Here is a picture I looked up from the Internet, very good to explain the Lucene indexing process (and the process of searching)

Lucene some concepts and the process of index building when building indexes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

google website building and hosting building building games functions of windows in building apc building building robots with raspberry pi and python building applications with microservices and docker books on team building and leadership

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Lucene some concepts and the process of index building when building indexes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support