Three compiling and running methods for Stanford corenlp open-source projects

Source: Internet
Author: User

Stanford corenlpOpen-source project3Compilation and running Modes

1.Stanford corenlpIntroduction

Stanford corenlp, integrating our ner, POS tagger, and parser with a new coreference System

The official website is described above.Stanford corenlp. It isStanfordOfNLPThe Group is an open-source project that combines several natural language processing components.This tool willStanfordOfNER,Pos tagger,ParserTool and a newCoreferenceThe system is integrated to form a complete natural language processing tool platform.

If you want to perform natural language processing, consider using it.

Official WebsiteLink: Http://nlp.stanford.edu/software/corenlp.shtml

:Http://nlp.stanford.edu/software/stanford-corenlp-v1.1.0.tgz

2.Stanford corenlpEasy to use

AverageJavaIf you want to useStanford corenlpGenerally, you only need to download it from the official website.Stanford corenlpOfJarPackage, put in the projectClasspath.

1.Download and decompress


 

2. EclipseCreate a projectSimplenlp

3.NewLibFolderStanford corenlpIn the decompressed Folder:Fastutil. Jar,Jgraph. Jar,Jgrapht. Jar,Stanford-corenlp-2011-06-19.jar,Stanford-corenlp-models-2011-06-19.jar,XOM. JarTheseJarPackage AdditionLib. AndAdd to build path...

 

4.Test: WriteTestcorenlp. JavaAs follows:

 

Import java. util. List;

Import java. util. Map;

Import java. util. properties;

 

Import edu. Stanford. NLP. dcoref. corefchain;

Import edu. Stanford. NLP. Ling. corelabel;

Import edu. Stanford. NLP. Ling. coreannotations. namedentitytagannotation;

Import edu. Stanford. NLP. Ling. coreannotations. partofspeechannotation;

Import edu. Stanford. NLP. Ling. coreannotations. sentencesannotation;

Import edu. Stanford. NLP. Ling. coreannotations. textannotation;

Import edu. Stanford. NLP. Ling. coreannotations. tokensannotation;

Import edu. Stanford. NLP. Ling. coreannotations. treeannotation;

Import edu. Stanford. NLP. Ling. corefcoreannotations. corefchainannotation;

Import edu. Stanford. NLP. pipeline. annotation;

Import edu. Stanford. NLP. pipeline. stanfordcorenlp;

Import edu. Stanford. NLP. Trees. tree;

Import edu. Stanford. NLP. Trees. semgraph. semanticgraph;

Import edu. Stanford. NLP. Trees. semgraph. semanticgraphcoreannotations. collapsedccprocesseddependenciesannotation;

Import edu. Stanford. NLP. util. coremap;

 

Public class testcorenlp {

Public static void main (string [] ARGs ){

// Creates a stanfordcorenlp object, with POS tagging, lemmatization, NER, parsing, and coreference resolution

Properties props = new properties ();

Props. Put ("annotators", "tokenize, ssplit, POs, lemma, NER, parse, dcoref ");

Stanfordcorenlp pipeline = new stanfordcorenlp (props );

// Read some text in the text variable

String text = "add your text here ";

// Create an empty annotation just with the given text

Annotation document = New Annotation (text );

// Run all annotators on this text

Pipeline. annotate (document );

// These are all the sentences in this document

// A coremap is essential a map that uses class objects as keys and has values with custom types

List <coremap> sentences = Document. Get (sentencesannotation. Class );

 

For (coremap sentence: sentences ){

// Traversing the words in the current sentence

// A corelabel is a coremap with additional token-specific methods

For (corelabel token: sentence. Get (tokensannotation. Class )){

// This is the text of the token

String word = token. Get (textannotation. Class );

// This is the POs tag of the token

String Pos = token. Get (partofspeechannotation. Class );

// This is the NER label of the token

String Ne = token. Get (namedentitytagannotation. Class );

System. Out. println (word + "," + POS + "," + NE );

}

 

// This is the parse tree of the current sentence

TREE tree = sentence. Get (treeannotation. Class );

 

// This is the Stanford dependency graph of the current sentence

Semanticgraph dependencies = sentence. Get (collapsedccprocesseddependenciesannotation. Class );

}

 

// This is the coreference link Graph

// Each chain stores a set of mentions that link to each other,

// Along with a method for getting the most representative mention

// Both sentence and Token offsets start at 1!

Map <integer, corefchain> graph =

Document. Get (corefchainannotation. Class );

}

 

}

 

This sectionCodeSetTextToStanfordcorenlpProcessing,StanfordcorenlpEach component (Annotator) Press"Tokenize, ssplit, POs, lemma, NER, parse, dcoref. They are: Word Segmentation, sentence breaking, fixed part of speech, word metaization, distinguishing named entities, syntax analysis, synonym resolution, etc.7Big components.

After processingList <coremap> sentences = Document. Get (sentencesannotation. Class );Contains all the analysis results. You can obtain the results through traversal.

Here, we simply print words, parts of speech, and whether they are entities.

Execution result:

 

Adding annotator tokenize

Adding annotator ssplit

Adding annotator POS

Loading POS model [edu/Stanford/NLP/models/POS-tagger/wsj3t0-18-left3words/left3words-distsim-wsj-0-18.tagger]... loading default properties from trained tagger edu/Stanford/NLP/models/POS-tagger/wsj3t0-18-left3words/left3words-distsim-wsj-0-18.tagger

Reading POS tagger model from edu/Stanford/NLP/models/POS-tagger/wsj3t0-18-left3words/left3words-distsim-wsj-0-18.tagger... done [4.4 sec].

Done [4.5 sec].

Adding annotator Lemma

Adding annotator ner

Loading classifier from edu/Stanford/NLP/models/NER/all.3class.distsim.crf.ser.gz... done [38.4 sec].

Loading classifier from edu/Stanford/NLP/models/NER/muc.distsim.crf.ser.gz... done [10.9 sec].

Loading classifier from edu/Stanford/NLP/models/NER/conll.distsim.crf.ser.gz... done [18.2 sec].

Adding annotator parse

Loading parser from serialized file edu/Stanford/NLP/models/lexparser/englishpcfg.ser.gz... done [11.5 sec].

Adding annotator dcoref

Add, VB, O

Your, GP $, O

Text, NN, O

Here, Rb, O

 

Remember to set it during execution-Xmx512mThis parameter. Otherwise promptJava heap space....StanfordRequirements32Bit machines should be set1800 m, And64Bit machines should be set3G. This memory requirement is staggering.

 

3.The project source code is inEclipseCompile and run

OpenSource codeIs not only the first2SectionJarPackage usage. In more cases, we need to download the source code, use the development tool to add the source code to our project, modify it, compile it, and finally deploy it intoJarPackage, although this is more2But the open-source framework can be easily modified. This is the first2Method.

1.Download and decompress the package. Source code inStanford-corenlp-2011-06-19-sources.jar. Decompress the fileSRCDirectory.

2. EclipseCreate a new projectNlptest, New under projectLibDirectory to decompressFastutil. Jar,Jgraph. Jar,Jgrapht. Jar,Stanford-corenlp-models-2011-06-19.jar,XOM. JarTheseJarPackageLibAndAdd to build path...

Note:2Section3Step by stepStanford-corenlp-2011-06-19.jarIn fact, thisJarFile isCorenlpSource Code CompiledClassFileJarPackage. We just wantJarReplace the package with the source code.

3.Change1Step-by-step decompressionSRCAll original files in the directory are copied toSRCDirectory. WaitEclipse.

4.SetCorenlpIn the original decompressed fileInput.txtCopy to the project root directory.

 

The obtained project directory is as follows.

5.Test: in the projectEdu. Stanford. NLP. piplelineThere is one under the packageStanfordcorenlp. Java. It is the test class of the entire project, and there isStanfordcorenlp. PropertiesThis is the project configuration file.

SetStanfordcorenlpThe running parameters of this class enable the entire project to run normally. The parameter settings are as follows:

 

It will be generated under the root directory of the project after runningInput.txt. xmlFile. This is all the parsing results.

As follows:

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.