Preface: Recently busy project want to try to use Stanford's parser, to parse the sentence generation parsing tree, and then analyze the sub-tree, and treekernal combined, training. Stanford parser artifact download down, can use but is the egg ache. A lot of instructions, but not a convenient quick about the general introduction.
first, its prerequisite
Stanford Parser Home: http://nlp.stanford.edu/software/lex-parser.shtml
Stanford Parser Download: http://nlp.stanford.edu/software/lex-parser.shtml#Download
Another extension tool: Java, Python and so on with the respective project needs to say.
second, use (Stanford parser)
After downloading the extract, according to the README.txt file, the halogen master is under the ubuntu15.04 system, JAVA7, not enough, according to the previous blog four lines of code installed JAVA8:
$ sudo add-apt-repository ppa:webupd8team/java$ sudo apt-get update$ sudo apt-get install oracle-java8-installer$ java-ve Rsion
After the java8 is ready, you can continue to compile under Ubuntu using Stanford parser. According to the instructions, run the lexparser.sh file, add the file name parameter, and run. Testsent.txt contains 5 sentences in English.
On a Unix system should is able to parse the Chinese test file with thefollowing command: ./lexparser.sh data/test Sent.txtthis uses the PCFG parser, which is quick to load and run, and quite accurate. [Notes:it takes a few seconds to load the parser data before Parsingbegins; continued parsing is quicker. The lexicalized parser, replaceenglishPCFG.ser.gz with englishFactored.ser.gz in the lexparser.sh Scriptand use the flag-mx600m to give + memory to Java.]
run the terminal in the containing lexparser.sh folder./lexparser.sh Data/tentsent.txt The results are as follows (partial):
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishpcfg.ser.gz ... done [0.5 sec]. Parsing file:data/testsent.txtparsing [sent. 1 Len]: Scores of properties is under extreme fire threat as a huge bla Ze continues to advance through Sydney ' s north-western suburbs. (S (NP (NNS Scores)) (PP (in) (NP (NNS properties))) (VP (VBP is) (PP (in Unde R) (NP (JJ Extreme) (NN fire) (NN threat))) (SBAR (in As) (S (NP (DT a) (JJ huge) (NN blaze)) (VP (VBZ Continues) (S (VP (To) (VP (VB Advance) (PP) ( In through) (NP (NP (NNP Sydney) (POS ' s)) (JJ North-western) (NNS suburbs)))))))) (. .))) NSUBJ (threat-8, Scores-1) case (properties-3, of-2) nmod:of (Scores-1, properties-3) cop (THREAT-8, are-4) case (Threat-8, UNDER-5) Amod (threat-8, extreme-6) compound (threat-8, fire-7) root (ROOT-0, Threat-8) mark (continues-13, as-9) det (blaze-12, a-10) amod (blaze-12, huge-11) nsubj (continues-13, blaze-12) nsubj ( Advance-15, blaze-12) advcl (threat-8, continues-13) mark (advance-15, to-14) Xcomp (continues-13, advance-15) case ( suburbs-20, through-16) Nmod:poss (suburbs-20, SYDNEY-17) case (SYDNEY-17, ' s-18) amod (suburbs-20, north-western-19) Nmod:through (advance-15, suburbs-20)
As can be seen, Stanford parser will be good at parsing English, and there are two ways of parsing. For other English data, can also be very good analysis. You think this is the end of the year, too young too.
The main thing is Chinese. In the same way, the "edu/stanford/nlp/models/lexparser/englishpcfg.ser.gz" in the lexparser.sh file is changed to: "edu/stanford/nlp/models/ Lexparser/chinesefactored.ser.gz ", the data has been changed in Chinese. Thought can also parse, but special slow ah, slow ah, slow ah. And no matter how to do it, it is resolved to a sentence, is because there is no participle, no participle, it may be the parameters are not adjusted well. No other blogs have been found for the right job.
Cond...
Iii. use of 2 (Nltk+stanford-parser.jar)
Tongren See me busy Stanford parser, said NLTK inside there is this, an instant demonstration of how the next in the NLTK, I have a xx ah, artifact in the side but will not use Ah, do not know NLTK artifact has this function. But only the results in the form of a list:
In [8]: from Nltk.parse import Stanfordin [9]: Stanford. Stanfordparser? type:typestring form: <class ' Nltk.parse.stanford.StanfordParser ' >file:/home/shifeng/ana Conda/lib/python2.7/site-packages/nltk/parse/stanford.pyinit Definition:stanford. Stanfordparser (self, path_to_jar=none, Path_to_models_jar=none, Model_path=u ' edu/stanford/nlp/models/lexparser/ EnglishPCFG.ser.gz ', Encoding=u ' UTF-8 ', Verbose=false, Java_options=u '-mx1000m ') docstring:interface to the Stanford Parser>>> parser=stanfordparser (... model_path= "edu/stanford/nlp/models/lexparser/englishpcfg.ser.gz" ... ) >>> parser.raw_parse_sents (... "The quick brown fox jumps over the lazy dog",... "The quick grey wolf jumps over the lazy fox" ...) [Tree (' ROOT ', [Tree (' NP '), [Tree (' NP '), [Tree (' DT ', [' the '] '), tree (' JJ ', [' Quick ']), tree (' JJ ', [' Brown ']), tree (' NN ', [' Fox]), tree (' NP ', [Tree (' NP ', [Tree (' NNS ', [' jumps '])]), tree (' PP ', [Tree (' in ', [' over ']), tree (' NP '), [Tree ('DT ', [' the ']), tree (' JJ ', [' lazy ']), tree (' NN ', [' dog '])]])]), tree (' ROOT ', [Tree (' NP '), [Tree (' NP ', [' The ']), tree (' JJ ', [' Quick ']), tree (' JJ ', [' Grey ']), tree (' NN ', [' Wolf '])]), tree (' NP ', [Tree (' NP '), [Tree (' NNS ', [' Jumps ']), tree (' PP ', [Tree (' in ', [' over ']), tree (' NP ', [Tree (' DT ', [' the ']), tree (' JJ ', [' lazy ']), tree (' NN ', [' Fox ']) ])])])])])]
I don't know how to do that. Tongren said is not download jar package, intends to download through nltk.download, the result is not good, in the side to see a leng a leng I said already in the online good. Through the online blog introduction, NLTK combined with Stanford-parser.jar analytic sentence:
In []: import Osin [+]: os.environ["stanford_parser"] = "Stanford-parser.jar" in [[]: Os.environ["stanford_models"] = "Stanford-parser-3.5.2-models.jar" in []: parser = Stanford. Stanfordparser (model_path=u ' edu/stanford/nlp/models/lexparser/englishpcfg.ser.gz ') in [+]: sentences = Parser.raw_ Parse_sents ("The quick brown fox jumps over the lazy dog", "The quick grey wolf jumps over the lazy Fox")) in []: Sentenc ESOUT[17]: [Tree (' ROOT ', [Tree (' NP '), [Tree (' NP '), [Tree (' DT ', [' the ']), tree (' JJ ', [' Quick ']), tree (' JJ ', [' Brown ']), Tree (' NN ', [' Fox ']]), tree (' NP ', [Tree (' NP '), [Tree (' NNS ', [' jumps '])]), tree (' PP ', [Tree '], [' over ']), tree (' NP ', [ Tree (' DT ', [' the ']), tree (' JJ ', [' lazy ']), tree (' NN ', [' dog '])]])]), tree (' ROOT ', [Tree (' NP ', [Tree '] [Tree (' DT ', [' the ']), tree (' JJ ', [' Quick ']), tree (' JJ ', [' Grey ']), tree (' NN ', [' Wolf '])]), tree (' NP ', [Tree (' NP '), [Tree (' NNS ', [ ' Jumps ']), tree (' PP ', [Tree (' in ', [' over ']), tree (' NP ', [Tree (' DT ', [' the ']), tree (' JJ ', [' lazy ']), tree (' NN ', [' Fox ']])])]) []in]: sentences = parser.raw_parse_sents ("Hello, My name is Melroy.", "What's your Name?")) in [+]: sentencesout[19]: [Tree (' ROOT ', [Tree (' S '), [Tree (' INTJ ', [Tree (' UH ', [' Hello ']]), tree (', ', ' [', ']), tree (' NP ', [Tree (' prp$ ', [' My ']), tree (' NN ', [' name '])], tree (' VP ', [Tree (' VBZ ', [' is ']), tree (' adjp ', [Tree (' JJ ', [' Melroy '])]] ), tree ('. ', ['. '])]), tree (' ROOT ', [Tree (' Sbarq ', [Tree (' WHNP '), [Tree (' WP ', [' What ')]), tree (' SQ '), [Tree (' VBZ ', [' Is ']), tree (' NP ', [Tree (' prp$ ', [' your ']), tree (' NN ', [' name ']])]), tree ('. ', ['? '])])]
Iv. use of 3 (Eclipse+java)
Ben did not want to use Java, do not want to use eclipse in Ubuntu, but see the elder brother with Eclipse syntax analysis, then think about trying. feasible, but only the tree structure, may initialize the object is the tree, in addition the array pattern should also be interworking.
Import Java.io.bufferedreader;import java.io.file;import Java.io.filereader;import Java.io.ioexception;import Java.io.unsupportedencodingexception;import Java.util.arraylist;import Java.util.list;import Edu.stanford.nlp.ling.word;import Edu.stanford.nlp.parser.lexparser.lexicalizedparser;import Edu.stanford.nlp.trees.tree;public class Parser {public static void main (string[] args) throws IOException {//string gram Mar = "edu/stanford/nlp/models/lexparser/chinesefactored.ser.gz"; String grammar = "edu/stanford/nlp/models/lexparser/chinesepcfg.ser.gz"; string[] options = {}; Lexicalizedparser LP = Lexicalizedparser.loadmodel (grammar, options); String line = "My name is Xiao Ming?" "; Tree parse = Lp.parse (line); Parse.pennprint (); String[] arg2 = {"-encoding", "Utf-8", "-outputformat", "penn,typeddependenciescollapsed", " Edu/stanford/nlp/models/lexparser/chinesefactored.ser.gz ","/home/shifeng/shifengworld/study/tool/stanford_ parser/Stanford-parser-full-2015-04-20/data/chinese-onesent-utf8.txt "}; Lexicalizedparser.main (ARG2);}}
Operation Result:
Picked up java_tool_options:-javaagent:/usr/share/java/jayatanaag.jar Loading parser from serialized file Edu/stanford /nlp/models/lexparser/chinesepcfg.ser.gz. Done [0.8 sec]. (rootloading parser from serialized file edu/stanford/nlp/models/lexparser/chinesefactored.ser.gz ... (IP (NP (DNP (NP (PN)) (DEG)) (NP (nn name))) (VP (VV called)) (NP (NN xiaoming))) (PU? ))) Done [4.1 sec]. Parsing file:/home/shifeng/shifengworld/study/tool/stanford_parser/stanford-parser-full-2015-04-20/data/ chinese-onesent-utf8.txtparsing [sent. 1 len. 8]: Russia hopes that Iran will not make a nuclear weapons program. (ROOT (NP (NR)) (VP (VV Hope) (IP (NP (NR Iran)) (VP (ADVP (AD not)) (VP (vv Manufacturing) (NP (nn nuclear weapon) (NN plan)))))) (PU. )) Nsubj (Hope-2, Russia-1) root (ROOT-0, hope -2) NSUBJ (Manufacturing-5, Iran-3) Neg (Manufacturing-5, without-4) Ccomp (Hope-2, Manufacturing-5) NN (plan-7, nuclear Weapon-6) Dobj (Manufacturing-5, plan-7 ) parsed file:/home/shifeng/shifengworld/study/tool/stanford_parser/stanford-parser-full-2015-04-20/data/ Chinese-oNesent-utf8.txt [1 sentences]. Parsed 8 words in 1 sentences (30.42 wds/sec; 3.80 sents/sec).
Java is not always good at the Lord, or continue to look for other ways ...
v. Experience of Experience
Check more information. The English is also strong to see down.
Reference:
1. StackOverflow: HTTP://STACKOVERFLOW.COM/QUESTIONS/13883277/STANFORD-PARSER-AND-NLTK
2. NLTK Official website: http://www.nltk.org/api/nltk.parse.html
3. NLTK official website:http://www.nltk.org/_modules/nltk/parse/stanford.html
4. Stanford Parser Official website: http://nlp.stanford.edu/software/parser-faq.shtml
5. Stanford Parser Download: http://nlp.stanford.edu/software/lex-parser.shtml#Download
6. Bo Friends Blog: http://blog.sina.com.cn/s/blog_8e037f440101eg93.html
7. Bo Friends blog:http://www.cnblogs.com/stGeekpower/p/3457746.html
8. Baidu Library: Http://wenku.baidu.com/link?url= Kzdygjdnme7yidocoppnclv1z95yiyf5n2yit4bd-6entvcpm8sptymx5qxajsx6sngtgpahucsb0oi2w2jqoac2nwdzukdvkmwnehqp0jg
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Stanford Parser Instructions for use