lucene 擷取分詞後的關鍵詞

最後更新：2018-12-05 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

使用不同的分詞器，最後得到的關鍵詞不同，需要的時間也不同

需要中文分詞是，用IKAnalyzer是不錯的選擇，但相比時間，我的電腦上大概分詞需要800+ms

分詞器工作流程：

輸入文本（What's your name?）
   →關鍵詞劃分(What's ; your ; name)，不同分詞器分法不同
       →消除停用詞()
           →形態還原 (What's -> What)
               →轉化小寫(What -> what)

private long stime;private long etime;private Analyzer analyzer;@Beforepublic void s(){stime = System.currentTimeMillis();}@Afterpublic void e(){etime = System.currentTimeMillis();System.out.println("使用" + analyzer.getClass().getName() + "分詞， 耗時" + (etime - stime) + "ms");}@Testpublic void test() throws Exception {//analyzer = new SimpleAnalyzer(Version.LUCENE_35);//analyzer = new StandardAnalyzer(Version.LUCENE_35);analyzer = new IKAnalyzer();analyze(analyzer, "hTTp://www.baidu.com/s?wd=Lucene中文分詞");}private void analyze(Analyzer analyzer, String text) throws Exception {TokenStream tokens = analyzer.reusableTokenStream("content", new StringReader(text));OffsetAttribute offsetAttr = tokens.getAttribute(OffsetAttribute.class);CharTermAttribute charTermAttr = tokens.getAttribute(CharTermAttribute.class);while (tokens.incrementToken()) {char[] charBuf = charTermAttr.buffer();String term = new String(charBuf, 0, offsetAttr.endOffset() - offsetAttr.startOffset());System.out.println(term + ", " + offsetAttr.startOffset() + ", " + offsetAttr.endOffset());}tokens.close();// while (ts.incrementToken()) {//過時// TermAttribute ta = ts.getAttribute(TermAttribute.class);// System.out.println(ta.term());// }}

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

lucene 擷取分詞後的關鍵詞

聯繫我們

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support