Open Source Parser--ANTLR

Source: Internet
Author: User
Tags lexer

Preface Sometimes, I really doubt that the lesson of this principle is a waste of time over time. After operating system principle we can not implement an operating system by ourselves; we can't get a decent DBMS out of the database. After learning the compiling principle, we seem to be able to look at a lot of symbols, tables and push their own active machine in a daze, and then with an extremely devout heart to the compiler to implement the work of the predecessor salute. Previously a little bit of some compiler bug dissatisfaction with the mood has disappeared. A few years ago when I was doing a simulation of a DBMS, there was a part of it that required compiling the where statement of the SQL language. Extract a useful form and ensure that the operator (and. Or and parentheses) are the respective priorities. At that time, I made a small compiler by hand with the knowledge of the compiling principle. Lasted several weeks, exhausted.

And one of my recent projects requires compiling a query language of its own definition (simplified version of SQL) and sending it to the sensor network as a query package. At this time I learned to be good, no longer everything from the wheel to build J, started when originally wanted to use Yacc/lex as the bottom compiler, but because the entire application is pure Java, based on portability considerations. And because I don't want to deal with a whole bunch of yacc/lex generated by the principle of pushing my own initiative, I've chosen an open source LL (K) syntax/lexical analyzer-ANTLR.

Instancename=ctl00_contentplaceholder1_entryeditor1_richtexteditor_richtexteditor&toolbar=default#_ Toc137966461 ">1.1        Lexical Analyzer ( Lexer ) ... 1-1

Instancename=ctl00_contentplaceholder1_entryeditor1_richtexteditor_richtexteditor&toolbar=default#_ Toc137966462 ">1.2        Parser ( Parser ) ... 1-1

Instancename=ctl00_contentplaceholder1_entryeditor1_richtexteditor_richtexteditor&toolbar=default#_ Toc137966463 ">1.3 ANTLR. 1-2 2.1 installation and use ... 2-2

Instancename=ctl00_contentplaceholder1_entryeditor1_richtexteditor_richtexteditor&toolbar=default#_ Toc137966465 ">2.2ANTLR Syntax file parsing ... 2-3 2.3 ANTLR Rules (rule) parsing ... 2-4 4.1 ANTLR Studio plug -in installation ... 4-7 4.2 function Brief introduction ... 4-8
1 ANTLR Brief Introductionantlr- a, its predecessor is Pccts, which contains Java. C++. The language of C # provides a framework for self-structuring the recognizer (recognizer), compiler (parser), and interpreter (translator) of its own definition language through a descriptive narrative of grammar. ANTLR is able to resolve a conflict by asserting (predicate), supporting actions and return values (return value); It is able to generate the syntax tree and visualize it based on the input itself (this is demonstrated in the following example).

As a result, the translation of computer language into a common task-before this yacc/lex appears to be too academic school, and LL (k)-based ANTLR Although the efficiency is still slightly inadequate. But after recent changes in the upgrade. Make ANTLR enough to cope with most of the existing applications. Thanks to Dr. Terence Parr and his colleagues for their excellent work over the past more than 10 years, they have done a lot of basic work on the basis of compiling theory and the construction of language tools, and have directly led to the production of Russian ANTLR.

nother Tool for Language Recognition1.1 lexical Analyzer (Lexer) The lexical parser is also known as scanner. Lexical Analyser and Tokenizer. Programming languages are usually made up of keyword and strictly defined grammatical structures.

The goal of compiling is to translate the high-level instruction of the programming language into the instruction that the material machine or virtual machine can run. The work of this parser is to analyze and quantify characters that would otherwise be meaningless, and translate them into discrete character groups (that is, a token of one), including keyword, identifiers. Symbols (symbols) and operators are used by the parser. The 1.2 parser (Parser) compiler is also known as syntactical analyser. In parsing the character stream, Lexer does not care about the syntactic meaning of the generated single token and its relationship to the context, which is the work of parser.

The parser organizes the received tokens and transforms it into a sequence that is agreed upon by the target language syntax definition. Whether lexer or parser is a recognizer, Lexer is a character sequence recognizer and parser is the token sequence recognizer. They are essentially similar things, but only different in the division of labor. 1.3 ANTLR ANTLR the above two together, it agrees that we define lexical rules for identifying character streams and lexical analysis rules for interpreting token flows.

ANTLR will then proactively generate the corresponding lexical/parser based on the user-supplied syntax file. The user is able to compile the text that they have entered. and converted into other forms (such as ast-abstract Syntax tree, abstract syntax trees).

2 ANTLR uses 2.1 to install and use the ANTLR development package and source code (such as version number 2.7.5) to http://www.antlr.org/download the latest version number. Configure the folder where Antlr-2.7.5.jar is located in your environment variable. Write a good grammar file (such as SENSORSQL.G) and execute the command "Java ANTLR." Tool SENSORSQL.G "will be able to obtain its own active generative grammar/lexical analyzer. 2.2 ANTLR Syntax file parsing The following we do some specific analysis of the ANTLR grammar file described in the figure.

In order to better use the ANTLR. You can also download ANTLR's Eclipse plugin to help you finish your work. 1. Header field: All the parts that are present here will appear at the top of the Java file that is now generated by ANTLR. In this example, you can put the package name and other information into this area, resulting in the results as seen by the corresponding code section of the polygon. 2. The content you provide in this section is unique to every grammar in the file.

The content of this area will precede the actual class definition of today. That is, two import belongs to class Calcparser only.         Other classes (such as calclexer) that are defined in the same file do not belong to3. Here is the syntax definition section. You can treat it as a class definition. 4. in the option domain. You can provide options for your grammar.

For example, whether to create a default abstract syntax tree. Specify the value of the parameter K in LL (k) (default = 1), and more specific references please refer to the ANTLR manual. 5. The token section is used to declare "imaginary" tokens that are not declared in the lexical parser. This information is often used to specify "imaginary" nodes in Treeparser.

6.         This is also an action area. ANTLR will faithfully place the information in this area into the definition of the class, which is equivalent to the member method of the class, which mainly provides a way for the user to customize the extensible method in parser. 2.3  ANTLR rule (rule ) parsing in ANTLR syntax file, A rule is defined as corresponding to a Java source file generated by the ANTLR. 1,2,3,4: As you have shown. We can do all the things that are equivalent to a function in a rule definition. We can specify a parameter for the rule (like int a above). Set the return value (int c). Even throws an exception. From the right side we can see clearly. All content defined in the rule is translated faithfully and accurately to the corresponding location in the Java source file.

5: This optional section gives us the ability to specify certain optional parameters. The representative that is seen in the comparison tells ANTLR not to generate the default error handling part when generating the code, which is the responsibility of the user. 7: In the Exception handling section, we are able to specify our own defined exception handling methods. Just like here, print error stack information. 3 ANTLR Syntax Example-sensorsql Sensorsql is a simplified version of the SQL language that you define yourself. Given the limitations of space. The syntax definitions it supports are not specifically listed here. I'm Just giving you a demo sample of the query: usually. The purpose of compiling a query is to turn it into a form that a query device can understand. There are two common ways to write specific grammatical rules, as mentioned in the previous section. After the ANTLR generates the corresponding Java file. will be able to directly use its execution results. There are many such examples, the most typical of which is the parsing of arithmetic expressions. For an expression such as 1+2-3*4/5^6. Just write good grammar rules. It is possible to get the result of the operation directly in the parsing process: first ANTLR compiles it into inverse Polish structure-(-(+ 1 2) (/(* 3 4) (^ 5 6)); In the process of generating the syntax tree. Computes the value of the expression synchronously, which is similar to the expression calculation seen in section 2.3.

The results are as follows: There is only one drawback to this. In very many cases, you may not know what to do with the method. So when you really want to start writing the code. will be limited to the code in the existing parser/lexer. Once you've changed it, compile the grammar file again. Generating new Java code is cumbersome. Also, once the process is wrong, repeat the debug change antlr generation. Self-generated code, the structure is really not good, debugging time is also troublesome. So assuming efficiency agrees, there's no need for ANTLR to do extra work. Simply concentrate on doing his grammatical analysis. The rest of the work waits until the syntax tree is generated and then how to traverse it or toss it. Is the result of the SENSORSQL syntax analysis I just demonstrated.

After producing this result, I need to translate each syntax element into a byte sequence package sent to the sensor network. At this point, in order to ensure the precedence in the where statement, you will be able to generate a structure similar to mine, in accordance with the chapter on generating the syntax tree in the ANTLR document. Then only the pre-order traversal of the where part of the syntax tree can achieve the goal, as for the rest. It's good to go through the sequence.


4 ANTLR Studio with the basics in front of us, we can start real work. Just use "Notepad or editplus+ command line" or simply write an ant script is not not able to, but always think that in the era of integration of the IDE is a bit too primitive in this way, fortunately placid System Provides us with an eclipse plugin that gives us the opportunity to go straight out of the primitive society. As: http://www.placidsystems.com/. Now the latest version number is 1.1.0.

The only thing that is regrettable is that this plugin, despite its very good functionality, is chargeable, otherwise it will only have a 11-day trial period. 4.1 ANTLR Studio plug -in installationEclipse under the plug-in installation from needless to say, note that from the placid system site provided by the license file, after downloading its name is license.lic.txt, to its suffix. txt, and then put it to eclipse_dir/ plugins/antlrstudio_x.x.x folder (here x.x.x is the version, for example-1.1.0). After a successful installation, a lexical analyzer navigation button appears on the Eclipse's toolbar: When you right-click your project, you'll find control over whether to use ANTLR Studio switches: When you open a grammar file, Can see for example the following interface: On the right of the outline form, column has all parser and lexer elements, can see protected token (such as number) and other ordinary token is not the same. On the left. Different areas are distinguished by highlighting different color blocks. 4.2 Function IntroductionANTLR Studio provides a more detailed description of the documentation in eclipse Help. So here I'm just introducing some new features of the 1.1.0 version number.l Fully supports ANTLR 2.7.6. and support to proactively upgrade the previous project to the 1.1.0 version number.l Syntax Diagram View provides a convenient view of the syntax structure entered.l Improved debug functionality and the ability to debug larger grammar files.

Before that, if a grammar file was very large, ANTLR studio would throw an exception. l Support Self-active code completion function. Provides a comprehensive hint of a ANTLR document (as seen below). 4.2.1 Syntax Chart view (Syntax Diagram view) in window->show view->other After selecting Show this view, you will be able to use this very cool feature to take advantage of this view. You can be very easy to see the syntax structure of your definition syntax, for example, my SELECT statement defines the following you just need to place the cursor anywhere in the selectstatement rule. Can be seen in syntax Diagram view: So the complete grammatical structure is clearly displayed in front of us. You simply need to place the cursor above the caret (^) (note: Caret is used to indicate that the caret's subrule is the root of the tree or subtree when the syntax tree is generated): The corresponding subrule is highlighted in pink, It would be cool to assume that your cursor is in a token position and will turn blue.


4.2.2 enhanced Debug function to start or turn off the debug function of ANTLR Studio, complete the following steps: L Enable/cancel an in project TLR StudioL Right-click Project to open the ANTLR Studio tab in properties. L Select/Cancel ' Enable debugging in grammar files ' after this, we will be able to use its debug function happily.

As with debugging other Java files, we are able to insert breakpoints anywhere in the grammar file: When the program executes to a breakpoint, we can use the same as "Skip" like debugging a normal application. "Continue" is performed by the debug mode of the Java application. Very convenient and handy.


5 Concluding remarks in this article. I did not introduce ANTLR 's various use or language details. I think these things have a lot of Chinese / English information on the Internet, and my focus is on a big direction and its core content. For those of you who do not want to learn more about the ANTLR implementation principle or do not want to study its code and simply want it to work in your project as soon as possible, I wish you all ANTLR Happy West Tour - J, like me .

Open Source Parser--ANTLR

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.