lucene 擷取分詞後的關鍵詞

來源:互聯網
上載者:User

使用不同的分詞器, 最後得到的關鍵詞不同, 需要的時間也不同

需要中文分詞是, 用IKAnalyzer是不錯的選擇, 但相比時間, 我的電腦上大概分詞需要800+ms

分詞器工作流程:

輸入文本(What's your name?)
    →關鍵詞劃分(What's ; your ; name), 不同分詞器分法不同
        →消除停用詞()
            →形態還原 (What's -> What)
                →轉化小寫(What -> what)

private long stime;private long etime;private Analyzer analyzer;@Beforepublic void s(){stime = System.currentTimeMillis();}@Afterpublic void e(){etime = System.currentTimeMillis();System.out.println("使用" + analyzer.getClass().getName() + "分詞, 耗時" + (etime - stime) + "ms");}@Testpublic void test() throws Exception {//analyzer = new SimpleAnalyzer(Version.LUCENE_35);//analyzer = new StandardAnalyzer(Version.LUCENE_35);analyzer = new IKAnalyzer();analyze(analyzer, "hTTp://www.baidu.com/s?wd=Lucene中文分詞");}private void analyze(Analyzer analyzer, String text) throws Exception {TokenStream tokens = analyzer.reusableTokenStream("content", new StringReader(text));OffsetAttribute offsetAttr = tokens.getAttribute(OffsetAttribute.class);CharTermAttribute charTermAttr = tokens.getAttribute(CharTermAttribute.class);while (tokens.incrementToken()) {char[] charBuf = charTermAttr.buffer();String term = new String(charBuf, 0, offsetAttr.endOffset() - offsetAttr.startOffset());System.out.println(term + ", " + offsetAttr.startOffset() + ", " + offsetAttr.endOffset());}tokens.close();// while (ts.incrementToken()) {//過時// TermAttribute ta = ts.getAttribute(TermAttribute.class);// System.out.println(ta.term());// }}

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.