Eclipse下使用Stanford CoreNLP的方法

來源:互聯網
上載者:User

標籤:style   blog   http   io   ar   color   os   使用   sp   

源碼:CoreNLP官網。

目前release的CoreNLP version 3.5.0版本僅支援java-1.8及以上版本,因此有時需要為Eclipse添加jdk-1.8配置,配置方法如下:

  • 首先,去oracle官網下載java-1.8,下載網址為:java下載,安裝完成後。
  • 開啟Eclipse,選擇Window -> Preferences -> Java –> Installed JREs 進行配置:
  • 點擊表單右邊的“add”,然後添加一個“Standard VM”(應該是標準虛擬機器的意思),然後點擊“next”;
  • 在”JRE HOME”那一行點擊右邊的“Directory…”找到你java 的安裝路徑,比如“C:Program Files/Java/jdk1.8”

這樣你的Eclipse就已經支援jdk-1.8了。

1. 建立java工程,注意編譯環境版本選擇1.8

2. 將官網下載的源碼解壓到工程下,並匯入所需jar包

如匯入stanford-corenlp-3.5.0.jar、stanford-corenlp-3.5.0-javadoc.jar、stanford-corenlp-3.5.0-models.jar、stanford-corenlp-3.5.0-sources.jar、xom.jar等

匯入jar包過程為:項目右擊->Properties->Java Build Path->Libraries,點擊“Add JARs”,在路徑中選取相應的jar包即可。

3. 建立TestCoreNLP類,代碼如下

 1 package Test; 2  3 import java.util.List; 4 import java.util.Map; 5 import java.util.Properties; 6  7 import edu.stanford.nlp.dcoref.CorefChain; 8 import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefChainAnnotation; 9 import edu.stanford.nlp.ling.CoreAnnotations.LemmaAnnotation;10 import edu.stanford.nlp.ling.CoreAnnotations.NamedEntityTagAnnotation;11 import edu.stanford.nlp.ling.CoreAnnotations.PartOfSpeechAnnotation;12 import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;13 import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;14 import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;15 import edu.stanford.nlp.ling.CoreLabel;16 import edu.stanford.nlp.pipeline.Annotation;17 import edu.stanford.nlp.pipeline.StanfordCoreNLP;18 import edu.stanford.nlp.semgraph.SemanticGraph;19 import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;20 import edu.stanford.nlp.sentiment.SentimentCoreAnnotations;21 import edu.stanford.nlp.trees.Tree;22 import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;23 import edu.stanford.nlp.util.CoreMap;24 25 public class TestCoreNLP {26     public static void main(String[] args) {27         // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution28         Properties props = new Properties();29         props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");30         StanfordCoreNLP pipeline = new StanfordCoreNLP(props);31         32         // read some text in the text variable33         String text = "Add your text here:Beijing sings Lenovo";34         35         // create an empty Annotation just with the given text36         Annotation document = new Annotation(text);37         38         // run all Annotators on this text39         pipeline.annotate(document);40         41         // these are all the sentences in this document42         // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types43         List<CoreMap> sentences = document.get(SentencesAnnotation.class);44         45         System.out.println("word\tpos\tlemma\tner");46         for(CoreMap sentence: sentences) {47              // traversing the words in the current sentence48              // a CoreLabel is a CoreMap with additional token-specific methods49             for (CoreLabel token: sentence.get(TokensAnnotation.class)) {50                 // this is the text of the token51                 String word = token.get(TextAnnotation.class);52                 // this is the POS tag of the token53                 String pos = token.get(PartOfSpeechAnnotation.class);54                 // this is the NER label of the token55                 String ne = token.get(NamedEntityTagAnnotation.class);56                 String lemma = token.get(LemmaAnnotation.class);57                 58                 System.out.println(word+"\t"+pos+"\t"+lemma+"\t"+ne);59             }60             // this is the parse tree of the current sentence61             Tree tree = sentence.get(TreeAnnotation.class);62             63             // this is the Stanford dependency graph of the current sentence64             SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);65         }66         // This is the coreference link graph67         // Each chain stores a set of mentions that link to each other,68         // along with a method for getting the most representative mention69         // Both sentence and token offsets start at 1!70         Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);71     }72 }

PS:該代碼的思想是將text字串交給Stanford CoreNLP處理,StanfordCoreNLP的各個組件(annotator)按“tokenize(分詞), ssplit(斷句), pos(詞性標註), lemma(詞元化), ner(具名實體識別), parse(文法分析), dcoref(同義字分辨)”順序進行處理。

處理完後List<CoreMap> sentences = document.get(SentencesAnnotation.class);中包含了所有分析結果,遍曆即可獲知結果。

這裡簡單的將單詞、詞性、詞元、是否實體列印出來。其餘的用法參見官網(如sentiment、parse、relation等)。

4. 執行結果:

Eclipse下使用Stanford CoreNLP的方法

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.