What the hell is ANTLR? Citing the official website's instructions,
What is ANTLR?
ANTLR (another Tool for Language recognition) are a powerful parser generator for reading, processing, executing, or translating Structured text or binary files. It ' s widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.
From the compiler's point of view, ANTLR can be used to help us do some of the work done by the compiler front end , lexical analysis (Lexicalanalyses), parsing (Syntaxanalysis), Generate abstract syntax tree (abstractSyntaxtree, AST) and more. Semantic analysis (Semanticanalyses), such as type checking, requires our own completion. One thing to be wordy about is that the ability to generate an AST is no longer available in ANTLR4 (the AST can be generated from the options designation in ANTLR3 output=AST
), butinstead the parse tree is generated. For the pros and cons of Ast and PT, take a look at the answer above.
To use ANTLR to generate the language's lexical parser (Lexer) with the parser (Parser), we need to tell ANTLR the grammar of our language (Grammar). ANTLR is based on context freeGrammar, which is described using a set of symbols similar to BNF . There are two common Parser that use context-independent grammars,LL Parser and LR Parser, and ANTLR help us generate the former.
Alright, the above wordy so much, next we pass a simple Li Zilai small try sledgehammer, see ANTLR is how to play.
The grammar of a simple calculator,
Grammar cal;options{//ANTLR'll generate Java lexer and parserLanguage = Java;} @lexer:: Header {package ME.KISIMPLE.JUST4FUN.ANTLR;} @parser:: Header {package ME.KISIMPLE.JUST4FUN.ANTLR;} /////////////Lexer rules:PLUS:' + '; Minus:'-'; MULTIPLY:' * ';D Ivide:'/'; Lparen:' ('; Rparen:' ) '; ZERO:' 0 '; Number:zero | [1-9][0-9]*; WS: [\t\r\n]+, Skip; /////////////Parser rules:Program:calexpr;calexpr:multexpr ((plus| Minus) multexpr) *; Multexpr:atom ((multiply| DIVIDE) Atom) *; Atom:lparen calexpr Rparen | number;
Use the tools provided by ANTLR to help us generate lexer and parser,
> java org.antlr.v4.Tool -visitor Cal.g4
-visitor
To tell ANTLR to help us generate PT Visitor, we can access PT through Visitor , and also through the Listener to traverse PT, the difference between the two is as follows.
The biggest difference between the listener and visitor mechanisms is that listener methods be called independently by an Antlr-provided Walker object, whereas visitor methods must walk their children with explicit visit calls. Forgetting to invoke visitor methods on a node ' s children, means those subtrees don ' t get visited.
ANTLR also helped us generate the token file, Cal.tokens
which reads as follows
PLUS=1MINUS=2MULTIPLY=3DIVIDE=4LPAREN=5RPAREN=6ZERO=7NUMBER=8WS=9‘+‘=1‘-‘=2‘*‘=3‘/‘=4‘(‘=5‘)‘=6‘0‘=7
ANTLR gives an integer type to tokens appearing in our grammar files, which define constants in parser, which are unavoidable when parsing semantic analysis.
Next we implement a visitor to access PT, we directly access the side to do the calculation, after the end of the visit our calculation results will come out (so here is an interpreter:)
PackageME.KISIMPLE.JUST4FUN.ANTLR;ImportOrg.antlr.v4.runtime.tree.ParseTree;ImportOrg.antlr.v4.runtime.tree.TerminalNode;ImportJava.util.List;ImportJava.util.Stack; Public class calvisitorimpl extends calbasevisitor<Long> { @Override PublicLongvisitcalexpr(Calparser.calexprcontext CTX) {list<parsetree> children = Ctx.children; Stack<long> Atomstack =NewStack<long> (); Stack<terminalnode> Opstack =NewStack<terminalnode> (); for(Parsetree Child:children) {if(ChildinstanceofCalparser.multexprcontext) {Long current = visitmultexpr ((calparser.multexprcontext) child);if(!opstack.isempty ()) {Terminalnode op = opstack.pop (); Long last = Atomstack.pop ();Switch(Op.getsymbol (). GetType ()) { CaseCalParser.PLUS:atomStack.push (last + current); Break; CaseCalParser.MINUS:atomStack.push (last-current); Break; } }Else{Atomstack.push (current); } }Else{Opstack.push (Terminalnode) child); } }returnAtomstack.pop (); }@Override PublicLongvisitmultexpr(Calparser.multexprcontext CTX) {list<parsetree> children = Ctx.children; Stack<long> Atomstack =NewStack<long> (); Stack<terminalnode> Opstack =NewStack<terminalnode> (); for(Parsetree Child:children) {if(ChildinstanceofCalparser.atomcontext) {Long current = Visitatom ((calparser.atomcontext) child);if(!opstack.isempty ()) {Terminalnode op = opstack.pop (); Long last = Atomstack.pop ();Switch(Op.getsymbol (). GetType ()) { CaseCalParser.MULTIPLY:atomStack.push (last * current); Break; CaseCalParser.DIVIDE:atomStack.push (last/current); Break; } }Else{Atomstack.push (current); } }Else{Opstack.push (Terminalnode) child); } }returnAtomstack.pop (); }@Override PublicLongVisitatom(Calparser.atomcontext CTX) {terminalnode node = ctx. Number ();if(Node! =NULL) {String text = Ctx.gettext ();returnlong.valueof (text); }//Ignore ' (' and ') ' returnVisitcalexpr (ctx.calexpr ()); }}
Use the following method,
Public Static void Main(string[] args)throwsThrowable {System.out.println ((7+7) * (7+7) -7) *7- the/7); String expr ="((7 + 7) * (7 + 7)-7) * 7-77/7"; Antlrinputstream input =NewAntlrinputstream (expr); Callexer lexer =NewCallexer (input); Commontokenstream tokens =NewCommontokenstream (lexer); Tokens.fill (); Calparser parser =NewCalparser (tokens); Parserrulecontext tree = Parser.program (); Calvisitorimpl Visitor =NewCalvisitorimpl (); System.out.println (Visitor.visit (tree)); }
This analysis tree can be seen through the idea plugin provided by ANTLR,
Error Handling
It is also important to note that, by default, the parser generated by ANTLR is parsed, and if an error is encountered, it will skip the wrong token directly, such as String expr = "7 77 + 7";
the parse tree,
The output is like this,
line1:2‘77‘ expecting {<EOF‘+‘‘-‘‘*‘‘/‘}14
77
Directly ignored, executed to 7 + 7
get the result above. But you can see, in fact, parser is aware that there is a mistake before the report extraneous input
. So we can refer to the default ANTLRErrorStrategy
, that is, DefaultErrorStrategy
to re-implement the logic of error handling,
Public class ierrorstrategy extends bailerrorstrategy { @Override PublicTokenRecoverinline(Parser recognizer)throwsrecognitionexception {//single TOKEN deletionToken Matchedsymbol = singletokendeletion (recognizer);if(Matchedsymbol! =NULL)return Super. Recoverinline (recognizer);//single TOKEN insertionSingletokeninsertion (recognizer);return Super. Recoverinline (recognizer); }@Override Public void Sync(Parser recognizer)throwsrecognitionexception {atnstate s = recognizer.getinterpreter (). Atn.states.get (Recognizer.getstate ());if(Inerrorrecoverymode (recognizer))return; Tokenstream tokens = Recognizer.getinputstream ();intLa = tokens. LA (1);//try cheaper subset first; might get lucky. Seems to shave a wee bit off if(Recognizer.getatn (). Nexttokens (s). Contains (LA) | | | la==token.eof)return;//Return but don ' t end recovery.-only does that upon valid token match if(Recognizer.isexpectedtoken (LA))return; Singletokendeletion (recognizer);Throw NewParsecancellationexception (NewInputmismatchexception (recognizer)); }}
Use parser when you need to set up, parser.setErrorHandler(new IErrorStrategy());
then you can see the success of the error and prompted:)
Line 1:2Extraneous input' the 'Expecting {<EOF,' + ','-',' * ','/'}exceptioninchThread"Main"Org.antlr.v4.runtime.misc.ParseCancellationException atMe.kisimple.just4fun.antlr.IErrorStrategy.sync (Ierrorstrategy.java: -) atME.KISIMPLE.JUST4FUN.ANTLR.CALPARSER.MULTEXPR (Calparser.java:251) atME.KISIMPLE.JUST4FUN.ANTLR.CALPARSER.CALEXPR (Calparser.java:172) atMe.kisimple.just4fun.antlr.CalParser.program (Calparser.java: the) atMe.kisimple.just4fun.Main.main (Main.java: -) atSUN.REFLECT.NATIVEMETHODACCESSORIMPL.INVOKE0 (Native Method) atSun.reflect.NativeMethodAccessorImpl.invoke (Nativemethodaccessorimpl.java: $) atSun.reflect.DelegatingMethodAccessorImpl.invoke (Delegatingmethodaccessorimpl.java: +) atJava.lang.reflect.Method.invoke (Method.java:606) atCom.intellij.rt.execution.application.AppMain.main (Appmain.java:134) caused by: Org.antlr.v4.runtime.InputMismatchException ...TenMore
Resources
- ANTLR 4 Documentation
- What is the difference between ANTLR 3 and 4?
- Https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Lexer+Rules
- Https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Parser+Rules
- Https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Parse+Tree+Listeners
- Http://www.ibm.com/developerworks/cn/java/j-lo-antlr/index.html
- Http://meri-stuff.blogspot.com/2011/08/antlr-tutorial-hello-word.html
Antlr#1: Describe a simple calculator