Stanford Parser Instructions for use

Source: Internet
Author: User
Tags nltk

Preface: Recently busy project want to try to use Stanford's parser, to parse the sentence generation parsing tree, and then analyze the sub-tree, and treekernal combined, training. Stanford parser artifact download down, can use but is the egg ache. A lot of instructions, but not a convenient quick about the general introduction.

first, its prerequisite

Stanford Parser Home: http://nlp.stanford.edu/software/lex-parser.shtml

Stanford Parser Download: http://nlp.stanford.edu/software/lex-parser.shtml#Download

Another extension tool: Java, Python and so on with the respective project needs to say.

second, use (Stanford parser)

After downloading the extract, according to the README.txt file, the halogen master is under the ubuntu15.04 system, JAVA7, not enough, according to the previous blog four lines of code installed JAVA8:

$ sudo add-apt-repository ppa:webupd8team/java$ sudo apt-get update$ sudo apt-get install oracle-java8-installer$ java-ve Rsion

After the java8 is ready, you can continue to compile under Ubuntu using Stanford parser. According to the instructions, run the lexparser.sh file, add the file name parameter, and run. Testsent.txt contains 5 sentences in English.

On a Unix system should is able to parse the Chinese test file with thefollowing command:    ./lexparser.sh data/test Sent.txtthis uses the PCFG parser, which is quick to load and run, and quite accurate. [Notes:it takes a few seconds to load the parser data before Parsingbegins; continued parsing is quicker.  The lexicalized parser, replaceenglishPCFG.ser.gz with englishFactored.ser.gz in the lexparser.sh Scriptand use the flag-mx600m to give + memory to Java.]
run the terminal in the containing lexparser.sh folder./lexparser.sh Data/tentsent.txt The results are as follows (partial):
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishpcfg.ser.gz ... done [0.5 sec]. Parsing file:data/testsent.txtparsing [sent. 1 Len]: Scores of properties is under extreme fire threat as a huge bla Ze continues to advance through Sydney ' s north-western suburbs. (S (NP (NNS Scores)) (PP (in) (NP (NNS properties))) (VP (VBP is) (PP (in Unde          R) (NP (JJ Extreme) (NN fire) (NN threat))) (SBAR (in As) (S (NP (DT a) (JJ huge) (NN blaze)) (VP (VBZ Continues) (S (VP (To) (VP (VB Advance) (PP) (  In through) (NP (NP (NNP Sydney) (POS ' s)) (JJ North-western)    (NNS suburbs)))))))) (. .))) NSUBJ (threat-8, Scores-1) case (properties-3, of-2) nmod:of (Scores-1, properties-3) cop (THREAT-8, are-4) case (Threat-8, UNDER-5) Amod (threat-8, extreme-6) compound (threat-8, fire-7) root (ROOT-0, Threat-8) mark (continues-13, as-9) det (blaze-12, a-10) amod (blaze-12, huge-11) nsubj (continues-13, blaze-12) nsubj ( Advance-15, blaze-12) advcl (threat-8, continues-13) mark (advance-15, to-14) Xcomp (continues-13, advance-15) case ( suburbs-20, through-16) Nmod:poss (suburbs-20, SYDNEY-17) case (SYDNEY-17, ' s-18) amod (suburbs-20, north-western-19) Nmod:through (advance-15, suburbs-20)
As can be seen, Stanford parser will be good at parsing English, and there are two ways of parsing. For other English data, can also be very good analysis. You think this is the end of the year, too young too.

The main thing is Chinese. In the same way, the "edu/stanford/nlp/models/lexparser/englishpcfg.ser.gz" in the lexparser.sh file is changed to: "edu/stanford/nlp/models/ Lexparser/chinesefactored.ser.gz ", the data has been changed in Chinese. Thought can also parse, but special slow ah, slow ah, slow ah. And no matter how to do it, it is resolved to a sentence, is because there is no participle, no participle, it may be the parameters are not adjusted well. No other blogs have been found for the right job.

Cond...

Iii. use of 2 (Nltk+stanford-parser.jar)

Tongren See me busy Stanford parser, said NLTK inside there is this, an instant demonstration of how the next in the NLTK, I have a xx ah, artifact in the side but will not use Ah, do not know NLTK artifact has this function. But only the results in the form of a list:

In [8]: from Nltk.parse import Stanfordin [9]: Stanford. Stanfordparser? type:typestring form: <class ' Nltk.parse.stanford.StanfordParser ' >file:/home/shifeng/ana Conda/lib/python2.7/site-packages/nltk/parse/stanford.pyinit Definition:stanford. Stanfordparser (self, path_to_jar=none, Path_to_models_jar=none, Model_path=u ' edu/stanford/nlp/models/lexparser/ EnglishPCFG.ser.gz ', Encoding=u ' UTF-8 ', Verbose=false, Java_options=u '-mx1000m ') docstring:interface to the Stanford Parser>>> parser=stanfordparser (... model_path= "edu/stanford/nlp/models/lexparser/englishpcfg.ser.gz" ... )     >>> parser.raw_parse_sents (...     "The quick brown fox jumps over the lazy dog",... "The quick grey wolf jumps over the lazy fox" ...) [Tree (' ROOT ', [Tree (' NP '), [Tree (' NP '), [Tree (' DT ', [' the '] '), tree (' JJ ', [' Quick ']), tree (' JJ ', [' Brown ']), tree (' NN ', [' Fox]), tree (' NP ', [Tree (' NP ', [Tree (' NNS ', [' jumps '])]), tree (' PP ', [Tree (' in ', [' over ']), tree (' NP '), [Tree ('DT ', [' the ']), tree (' JJ ', [' lazy ']), tree (' NN ', [' dog '])]])]), tree (' ROOT ', [Tree (' NP '), [Tree (' NP ', [' The ']), tree (' JJ ', [' Quick ']), tree (' JJ ', [' Grey ']), tree (' NN ', [' Wolf '])]), tree (' NP ', [Tree (' NP '), [Tree (' NNS ', [' Jumps ']), tree (' PP ', [Tree (' in ', [' over ']), tree (' NP ', [Tree (' DT ', [' the ']), tree (' JJ ', [' lazy ']), tree (' NN ', [' Fox ']) ])])])])])]
I don't know how to do that. Tongren said is not download jar package, intends to download through nltk.download, the result is not good, in the side to see a leng a leng I said already in the online good. Through the online blog introduction, NLTK combined with Stanford-parser.jar analytic sentence:

In []: import Osin [+]: os.environ["stanford_parser"] = "Stanford-parser.jar" in [[]: Os.environ["stanford_models"] = "Stanford-parser-3.5.2-models.jar" in []: parser = Stanford. Stanfordparser (model_path=u ' edu/stanford/nlp/models/lexparser/englishpcfg.ser.gz ') in [+]: sentences = Parser.raw_ Parse_sents ("The quick brown fox jumps over the lazy dog", "The quick grey wolf jumps over the lazy Fox")) in []: Sentenc ESOUT[17]: [Tree (' ROOT ', [Tree (' NP '), [Tree (' NP '), [Tree (' DT ', [' the ']), tree (' JJ ', [' Quick ']), tree (' JJ ', [' Brown ']), Tree (' NN ', [' Fox ']]), tree (' NP ', [Tree (' NP '), [Tree (' NNS ', [' jumps '])]), tree (' PP ', [Tree '], [' over ']), tree (' NP ', [ Tree (' DT ', [' the ']), tree (' JJ ', [' lazy ']), tree (' NN ', [' dog '])]])]), tree (' ROOT ', [Tree (' NP ', [Tree '] [Tree (' DT ', [' the ']), tree (' JJ ', [' Quick ']), tree (' JJ ', [' Grey ']), tree (' NN ', [' Wolf '])]), tree (' NP ', [Tree (' NP '), [Tree (' NNS ', [ ' Jumps ']), tree (' PP ', [Tree (' in ', [' over ']), tree (' NP ', [Tree (' DT ', [' the ']), tree (' JJ ', [' lazy ']), tree (' NN ', [' Fox ']])])]) []in]: sentences = parser.raw_parse_sents ("Hello, My name is Melroy.", "What's your Name?")) in [+]: sentencesout[19]: [Tree (' ROOT ', [Tree (' S '), [Tree (' INTJ ', [Tree (' UH ', [' Hello ']]), tree (', ', ' [', ']), tree (' NP ', [Tree (' prp$ ', [' My ']), tree (' NN ', [' name '])], tree (' VP ', [Tree (' VBZ ', [' is ']), tree (' adjp ', [Tree (' JJ ', [' Melroy '])]] ), tree ('. ', ['. '])]), tree (' ROOT ', [Tree (' Sbarq ', [Tree (' WHNP '), [Tree (' WP ', [' What ')]), tree (' SQ '), [Tree (' VBZ ', [' Is ']), tree (' NP ', [Tree (' prp$ ', [' your ']), tree (' NN ', [' name ']])]), tree ('. ', ['? '])])]

Iv. use of 3 (Eclipse+java)

Ben did not want to use Java, do not want to use eclipse in Ubuntu, but see the elder brother with Eclipse syntax analysis, then think about trying. feasible, but only the tree structure, may initialize the object is the tree, in addition the array pattern should also be interworking.

Import Java.io.bufferedreader;import java.io.file;import Java.io.filereader;import Java.io.ioexception;import Java.io.unsupportedencodingexception;import Java.util.arraylist;import Java.util.list;import Edu.stanford.nlp.ling.word;import Edu.stanford.nlp.parser.lexparser.lexicalizedparser;import Edu.stanford.nlp.trees.tree;public class Parser {public static void main (string[] args) throws IOException {//string gram Mar = "edu/stanford/nlp/models/lexparser/chinesefactored.ser.gz";    String grammar = "edu/stanford/nlp/models/lexparser/chinesepcfg.ser.gz";    string[] options = {};    Lexicalizedparser LP = Lexicalizedparser.loadmodel (grammar, options); String line = "My name is Xiao Ming?"    ";    Tree parse = Lp.parse (line);        Parse.pennprint (); String[] arg2 = {"-encoding", "Utf-8", "-outputformat", "penn,typeddependenciescollapsed", " Edu/stanford/nlp/models/lexparser/chinesefactored.ser.gz ","/home/shifeng/shifengworld/study/tool/stanford_ parser/Stanford-parser-full-2015-04-20/data/chinese-onesent-utf8.txt "}; Lexicalizedparser.main (ARG2);}}
Operation Result:

Picked up java_tool_options:-javaagent:/usr/share/java/jayatanaag.jar Loading parser from serialized file Edu/stanford /nlp/models/lexparser/chinesepcfg.ser.gz. Done [0.8 sec].   (rootloading parser from serialized file edu/stanford/nlp/models/lexparser/chinesefactored.ser.gz ... (IP (NP (DNP (NP (PN)) (DEG)) (NP (nn name))) (VP (VV called)) (NP (NN xiaoming))) (PU? ))) Done [4.1 sec]. Parsing file:/home/shifeng/shifengworld/study/tool/stanford_parser/stanford-parser-full-2015-04-20/data/ chinese-onesent-utf8.txtparsing [sent. 1 len. 8]: Russia hopes that Iran will not make a nuclear weapons program.  (ROOT (NP (NR)) (VP (VV Hope) (IP (NP (NR Iran)) (VP (ADVP (AD not)) (VP (vv Manufacturing) (NP (nn nuclear weapon) (NN plan)))))) (PU. )) Nsubj (Hope-2, Russia-1) root (ROOT-0, hope -2) NSUBJ (Manufacturing-5, Iran-3) Neg (Manufacturing-5, without-4) Ccomp (Hope-2, Manufacturing-5) NN (plan-7, nuclear Weapon-6) Dobj (Manufacturing-5, plan-7 ) parsed file:/home/shifeng/shifengworld/study/tool/stanford_parser/stanford-parser-full-2015-04-20/data/ Chinese-oNesent-utf8.txt [1 sentences]. Parsed 8 words in 1 sentences (30.42 wds/sec; 3.80 sents/sec).


Java is not always good at the Lord, or continue to look for other ways ...


v. Experience of Experience

Check more information. The English is also strong to see down.

Reference:

1. StackOverflow: HTTP://STACKOVERFLOW.COM/QUESTIONS/13883277/STANFORD-PARSER-AND-NLTK

2. NLTK Official website: http://www.nltk.org/api/nltk.parse.html

3. NLTK official website:http://www.nltk.org/_modules/nltk/parse/stanford.html

4. Stanford Parser Official website: http://nlp.stanford.edu/software/parser-faq.shtml

5. Stanford Parser Download: http://nlp.stanford.edu/software/lex-parser.shtml#Download

6. Bo Friends Blog: http://blog.sina.com.cn/s/blog_8e037f440101eg93.html

7. Bo Friends blog:http://www.cnblogs.com/stGeekpower/p/3457746.html

8. Baidu Library: Http://wenku.baidu.com/link?url= Kzdygjdnme7yidocoppnclv1z95yiyf5n2yit4bd-6entvcpm8sptymx5qxajsx6sngtgpahucsb0oi2w2jqoac2nwdzukdvkmwnehqp0jg


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Stanford Parser Instructions for use

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.