Stanford corenlpOpen-source project3Compilation and running Modes
1.Stanford corenlpIntroduction
Stanford corenlp, integrating our ner, POS tagger, and parser with a new coreference System
The official website is described above.Stanford corenlp. It isStanfordOfNLPThe Group is an open-source project that combines several natural language processing components.This tool willStanfordOfNER,Pos tagger,ParserTool and a newCoreferenceThe system is integrated to form a complete natural language processing tool platform.
If you want to perform natural language processing, consider using it.
Official WebsiteLink: Http://nlp.stanford.edu/software/corenlp.shtml
:Http://nlp.stanford.edu/software/stanford-corenlp-v1.1.0.tgz
2.Stanford corenlpEasy to use
AverageJavaIf you want to useStanford corenlpGenerally, you only need to download it from the official website.Stanford corenlpOfJarPackage, put in the projectClasspath.
1.Download and decompress
2. EclipseCreate a projectSimplenlp
3.NewLibFolderStanford corenlpIn the decompressed Folder:Fastutil. Jar,Jgraph. Jar,Jgrapht. Jar,Stanford-corenlp-2011-06-19.jar,Stanford-corenlp-models-2011-06-19.jar,XOM. JarTheseJarPackage AdditionLib. AndAdd to build path...
4.Test: WriteTestcorenlp. JavaAs follows:
Import java. util. List; Import java. util. Map; Import java. util. properties; Import edu. Stanford. NLP. dcoref. corefchain; Import edu. Stanford. NLP. Ling. corelabel; Import edu. Stanford. NLP. Ling. coreannotations. namedentitytagannotation; Import edu. Stanford. NLP. Ling. coreannotations. partofspeechannotation; Import edu. Stanford. NLP. Ling. coreannotations. sentencesannotation; Import edu. Stanford. NLP. Ling. coreannotations. textannotation; Import edu. Stanford. NLP. Ling. coreannotations. tokensannotation; Import edu. Stanford. NLP. Ling. coreannotations. treeannotation; Import edu. Stanford. NLP. Ling. corefcoreannotations. corefchainannotation; Import edu. Stanford. NLP. pipeline. annotation; Import edu. Stanford. NLP. pipeline. stanfordcorenlp; Import edu. Stanford. NLP. Trees. tree; Import edu. Stanford. NLP. Trees. semgraph. semanticgraph; Import edu. Stanford. NLP. Trees. semgraph. semanticgraphcoreannotations. collapsedccprocesseddependenciesannotation; Import edu. Stanford. NLP. util. coremap; Public class testcorenlp { Public static void main (string [] ARGs ){ // Creates a stanfordcorenlp object, with POS tagging, lemmatization, NER, parsing, and coreference resolution Properties props = new properties (); Props. Put ("annotators", "tokenize, ssplit, POs, lemma, NER, parse, dcoref "); Stanfordcorenlp pipeline = new stanfordcorenlp (props ); // Read some text in the text variable String text = "add your text here "; // Create an empty annotation just with the given text Annotation document = New Annotation (text ); // Run all annotators on this text Pipeline. annotate (document ); // These are all the sentences in this document // A coremap is essential a map that uses class objects as keys and has values with custom types List <coremap> sentences = Document. Get (sentencesannotation. Class ); For (coremap sentence: sentences ){ // Traversing the words in the current sentence // A corelabel is a coremap with additional token-specific methods For (corelabel token: sentence. Get (tokensannotation. Class )){ // This is the text of the token String word = token. Get (textannotation. Class ); // This is the POs tag of the token String Pos = token. Get (partofspeechannotation. Class ); // This is the NER label of the token String Ne = token. Get (namedentitytagannotation. Class ); System. Out. println (word + "," + POS + "," + NE ); } // This is the parse tree of the current sentence TREE tree = sentence. Get (treeannotation. Class ); // This is the Stanford dependency graph of the current sentence Semanticgraph dependencies = sentence. Get (collapsedccprocesseddependenciesannotation. Class ); } // This is the coreference link Graph // Each chain stores a set of mentions that link to each other, // Along with a method for getting the most representative mention // Both sentence and Token offsets start at 1! Map <integer, corefchain> graph = Document. Get (corefchainannotation. Class ); } } |
This sectionCodeSetTextToStanfordcorenlpProcessing,StanfordcorenlpEach component (Annotator) Press"Tokenize, ssplit, POs, lemma, NER, parse, dcoref. They are: Word Segmentation, sentence breaking, fixed part of speech, word metaization, distinguishing named entities, syntax analysis, synonym resolution, etc.7Big components.
After processingList <coremap> sentences = Document. Get (sentencesannotation. Class );Contains all the analysis results. You can obtain the results through traversal.
Here, we simply print words, parts of speech, and whether they are entities.
Execution result:
Adding annotator tokenize Adding annotator ssplit Adding annotator POS Loading POS model [edu/Stanford/NLP/models/POS-tagger/wsj3t0-18-left3words/left3words-distsim-wsj-0-18.tagger]... loading default properties from trained tagger edu/Stanford/NLP/models/POS-tagger/wsj3t0-18-left3words/left3words-distsim-wsj-0-18.tagger Reading POS tagger model from edu/Stanford/NLP/models/POS-tagger/wsj3t0-18-left3words/left3words-distsim-wsj-0-18.tagger... done [4.4 sec]. Done [4.5 sec]. Adding annotator Lemma Adding annotator ner Loading classifier from edu/Stanford/NLP/models/NER/all.3class.distsim.crf.ser.gz... done [38.4 sec]. Loading classifier from edu/Stanford/NLP/models/NER/muc.distsim.crf.ser.gz... done [10.9 sec]. Loading classifier from edu/Stanford/NLP/models/NER/conll.distsim.crf.ser.gz... done [18.2 sec]. Adding annotator parse Loading parser from serialized file edu/Stanford/NLP/models/lexparser/englishpcfg.ser.gz... done [11.5 sec]. Adding annotator dcoref Add, VB, O Your, GP $, O Text, NN, O Here, Rb, O |
Remember to set it during execution-Xmx512mThis parameter. Otherwise promptJava heap space....StanfordRequirements32Bit machines should be set1800 m, And64Bit machines should be set3G. This memory requirement is staggering.
3.The project source code is inEclipseCompile and run
OpenSource codeIs not only the first2SectionJarPackage usage. In more cases, we need to download the source code, use the development tool to add the source code to our project, modify it, compile it, and finally deploy it intoJarPackage, although this is more2But the open-source framework can be easily modified. This is the first2Method.
1.Download and decompress the package. Source code inStanford-corenlp-2011-06-19-sources.jar. Decompress the fileSRCDirectory.
2. EclipseCreate a new projectNlptest, New under projectLibDirectory to decompressFastutil. Jar,Jgraph. Jar,Jgrapht. Jar,Stanford-corenlp-models-2011-06-19.jar,XOM. JarTheseJarPackageLibAndAdd to build path...
Note:2Section3Step by stepStanford-corenlp-2011-06-19.jarIn fact, thisJarFile isCorenlpSource Code CompiledClassFileJarPackage. We just wantJarReplace the package with the source code.
3.Change1Step-by-step decompressionSRCAll original files in the directory are copied toSRCDirectory. WaitEclipse.
4.SetCorenlpIn the original decompressed fileInput.txtCopy to the project root directory.
The obtained project directory is as follows.
5.Test: in the projectEdu. Stanford. NLP. piplelineThere is one under the packageStanfordcorenlp. Java. It is the test class of the entire project, and there isStanfordcorenlp. PropertiesThis is the project configuration file.
SetStanfordcorenlpThe running parameters of this class enable the entire project to run normally. The parameter settings are as follows:
It will be generated under the root directory of the project after runningInput.txt. xmlFile. This is all the parsing results.
As follows: