【Lucene4.8教程之四】分析

來源:互聯網
上載者:User

標籤:des   Lucene   style   class   blog   code   


1、基礎內容

(1)相關概念

分析(Analysis),在Lucene中指的是將域(Field)文本轉換成最基本的索引表示單元--項(Term)的過程。在搜尋過程中,這些項用於決定什麼樣的文檔能夠匹配查詞條件。

分析器對分析操作進行了封裝,它通過執行若干操作,將文本轉化成語彙單元,這個處理過程也稱為語彙單元化過程(tokenization),而從文本洲中提取的文字區塊稱為語彙單元(token)。詞彙單元與它的網域名稱結合後,就形成了項。

(2)何時流量分析器

  • 建立索引期間
Directory returnIndexDir = FSDirectory.open(indexDir);IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_48,new StandardAnalyzer(Version.LUCENE_48));IndexWriter writer = new IndexWriter(returnIndexDir, iwc);
  • 使用QueryParser對象進行搜尋時
QueryParser parser = new QueryParser(Version.LUCENE_48, "contents",new SimpleAnalyzer(Version.LUCENE_48));
  • 在搜尋中高亮顯示結果時
(3)常用的4個分析器:
  • WhitespaceAnalyzer, as the name implies, simply splits text into tokens on whitespace characters and makes no other effort to normalize the tokens.
  • SimpleAnalyzer first splits tokens at non-letter characters, then lowercases each token. Be careful! This analyzer quietly discards numeric characters.
  • StopAnalyzer is the same as SimpleAnalyzer, except it removes common words (called stop words, described more in section XXX). By default it removes common words in the English language (the, a, etc.), though you can pass in your own set.
  • StandardAnalyzer is Lucene’s most sophisticated core analyzer. It has quite a bit of logic to identify certain kinds of tokens, such as company names,

四、其它內容

在建立IndexWriter時,需要指定分析器,如:
<span></span>IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_48,<span></span>new StandardAnalyzer(Version.LUCENE_48));<span></span>writer = new IndexWriter(returnIndexDir, iwc);
便在每次向writer中添加文檔時,可以針對該文檔指定一個分析器,如
writer.addDocument(doc, new SimpleAnalyzer(Version.LUCENE_48));



聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.