[Tutorial 4 of ipve4.8] Analysis

Source: Internet
Author: User


1. Basic Content

(1) Related Concepts

Analysis refers to the process of converting the field text into the most basic index Representation Unit-term. During the search process, these items are used to determine what documents can match word search conditions.

Analyzer encapsulates analysis operations. It converts text into Vocabulary units by performing several operations. This processing process is also called vocabulary unit process (tokenization ), the text blocks extracted from the continent are called tokens ). After a word unit is combined with its domain name, it forms an item.

(2) When to use Analyzer

  • During index creation
Directory returnIndexDir = FSDirectory.open(indexDir);IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_48,new StandardAnalyzer(Version.LUCENE_48));IndexWriter writer = new IndexWriter(returnIndexDir, iwc);
  • When queryparser object is used for search
QueryParser parser = new QueryParser(Version.LUCENE_48, "contents",new SimpleAnalyzer(Version.LUCENE_48));
  • When the search results are highlighted
(3) Four commonly used analyzers:
  • Whitespaceanalyzer, as the name implies, simply splits text into tokens on whitespace characters and makes no other effort to normalize the tokens.
  • Simpleanalyzer first splits tokens at non-letter characters, then lowercases each token. Be careful! This analyzer quietly discards numeric characters.
  • Stopanalyzer is the same as simpleanalyzer, cipher t it removes common words (called Stop Words, described more in section XXX ). by default it removes common words in the English language (the, A, etc .), though you can pass in your own set.
  • Standardanalyzer is Lucene's most sophisticated core analyzer. It has quite a bit of logic to identify certain kinds of tokens, such as company names,

IV. Other content

When creating indexwriter, you must specify the analyzer, for example:
<span></span>IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_48,<span></span>new StandardAnalyzer(Version.LUCENE_48));<span></span>writer = new IndexWriter(returnIndexDir, iwc);
You can specify a analyzer for this document each time you add a document to writer, as shown in figure
writer.addDocument(doc, new SimpleAnalyzer(Version.LUCENE_48));



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.