JJTree Tutorial for Advanced Java parsing

Source: Internet
Author: User

The problem

JJTree is a part of JavaCC are a parser/scanner generator for Java. JJTree is a preprocessor for JavaCC this inserts parse tree building actions at various places in the JavaCC source. To follow along your need to understand the core concepts of parsing. Also Review basic JJTree documentation and samples provided in JavaCC distribution (version 4.0).

JJTree is magically powerful, but it's as complex. We used it quite successfully at my startup www.moola.com. After some the basic grammar rules, lookaheads, node annotations and prototyping I felt quite comfortabl E with the tool. However, just recently when I had to use JJTree again I-hit the same steep learning curve as if I had never seen JJTree b Efore.

How to write a tutorial this gets you back on shape quickly without forcing the full relearning?

The solution

Here I capture my notes in a specific form, which I do not has to face this same learning curve again in the future. You can think my approach as layered improvement to a grammar that follows these steps:

    • Get Lexer
    • Complete grammar
    • Optimize produced AST
    • Define Custom Node
    • Define actions
    • Write evaluator

I always start simple and need to go more complex-this are exactly how I'll document it. In each example I-start with a trivial portion of grammar and then add some + to it-force specific behavior. New code is all in green. Let's hope this save all of the US the relearning.

Reorder tokens from + specific to less specific

The token in token sections can be declared in any order. But you had to pay very close attention to the order because the matching of tokens starts from the top and down the list Until first matching token is found. For example notice how "interface" or "exception" is defined before string_literal. If we had defined "interface" after String_literal "interface" would never get matched, string_literal would.

The ordering is the same reason so we can ' t just use "interface" inline in the definition of productions. The string_literal would always match first.

Remove some nodes from final AST

Some nodes do not has any special meaning and should is excluded from the final AST. This is do by using the #void like this:

void Interfacedecl () #void: {} {exceptionclause () | Enumclause () | Structclause () | MethodDecl ()}
ADD action to a production

You'll definitely need to add actions to the production for your parser to be useful. Here I capture the text of the current token (t.image) and put it into jjthis node that would resolve to my custom node CLA SS Typedecl. You bind a variable "T" to a token using "="; The action itself is in curly braces right after the production and can refer to current token as "T" and current AST node As "Jjtthis".

void Typedecl (): {Token t;} {<void>|t=<term> {jjtthis.name = t.image;} ("[]")?}}

Here I further set IsArray property to True only if "[]" are found after the <term>:

void Typedecl (): {Token t;} {<void>|t=<term> {jjtthis.name = t.image;} ("[]" {Jjtthis.isarray = true;})?}}
Multiple actions inside one production rule

Just as we have seen earlier you can access values of multiple tokens in one production rule. Notice how I declare the separate tokens "T" and "N". Here:

void Constdecl (): {Token T; Token N;} {LOOKAHEAD (2) t=<term> {jjtthis.name = t.image;} "=" n=<number> {jjtthis.value = integer.valueof (n.image);}| <term>}
Lookaheads

There is certain points in complex grammars the might not get parsed unambiguously using just one token look ahead. If You is writing high performance parser you might need to rewrite grammar. But if does not care about performance your can force lookahead for more that one symbol.

JJTree Generator would give you a warning about ambiguities. Go the The rule it refers to and set lookahead of 2 or more like this:

void Enumdeclitem (): {}{lookahead (2) <TERM> "=" <NUMBER>|<TERM>}
Node return Values

It is possible to return nodes from the productions and just like function return values. Here I am declaring the asttypedecl would be returned.

Asttypedecl typedecl (): {Token t;} {<void>|t=<term> {jjtthis.name = t.image;} ("[]" {Jjtthis.isarray = true;})?} {return jjtthis;}}

Once You start has a lot of expressions in one production it's better to group them together so return statement appli Es to all of them. The above example would actually result in a bug due to a fact, the return statement was attached to one branch of "|" P Roduction and not to both branches. We can easily fix the issue using parenthesis to force order of Precendence:

Asttypedecl typedecl (): {Token t;} {(<VOID>|t=<TERM> {jjtthis.name = t.image;} ("[]" {Jjtthis.isarray = true;})?) {return jjtthis;}}
Build abstract syntax tree as you go

After you has all production return values of you can build AST tree on the fly while parsing. Just provide found overloaded add () methods in the Astinterfacedecl class and call them like this:

void Interfacedecl () #void: {Astexceptionclause ex; Astenumclause en; Aststructclause St; Astmethoddecl me;} Ex=exceptionclause () {Jjtthis.add (ex);}| En=enumclause () {jjtthis.add (en);}| St=structclause () {Jjtthis.add (ST);}| Me=methoddecl () {jjtthis.add (Me);}}
Use <EOF>

Quite often you can get your grammar written and start celebration when your notice that part of the file is not being pars Ed ... This happens because do not tell the parser to read all content till the end of file and it feels free to stop parsin G at would. Force parsing to reach end of file by demanding <EOF> tokens at the top most production:

void Interfacedecl () #void: {} {exceptionclause () | Enumclause () | Structclause () | MethodDecl () |<eof>}
The Final Word

JJTree works incredibly well. No excuse to regex parsing no more ... Don ' t even try to convince me!

Drop me a line if you need help with Jjtree-will is glad to share the experiences with you.

References
    1. The JavaCC FAQ by Theodore S. Norvell

JJTree Tutorial for Advanced Java parsing

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.