Open-source syntax analyzer-anlr

Source: Internet
Author: User
Preface

Sometimes, I really doubt whether the principles I learned in my undergraduate courses are a waste of time. For example, after learning the operating system principles, we cannot implement an operating system by ourselves. after learning the database principles, we cannot create a decent DBMS. Similarly, after learning the compilation principle, we seem to have to look at a lot of symbols, tables and push-down automation in a daze, and then pay tribute to our predecessors who are engaged in compiler implementation with an extremely devout mind, previously, we were not satisfied with some minor bugs in the compiler. When I did a simulation test on DBMS a few years ago, some of them required to compile the WHERE statement of the SQL language, extract useful forms, AND ensure the operators (AND, OR and parentheses. At that time, I made a small compiler using the knowledge I mentioned in the compilation principle. It took weeks and I was exhausted. One of my most recent projects requires compiling a custom Query Language (simplified modified SQL) and sending it to the sensor network in the form of a query package. At this time, I learned how to create J from the wheel. At the beginning, I wanted to use YACC/LEX as the underlying compiler, but because the entire application is pure Java, I chose another open-source LL (K) syntax/Lexical analyzer-anlr.


1.1 lexical analyzer (Lexer )... 1-11.2 syntax analyzer (Parser )... 1-11.3 anlr. 1-22.1 installation and use... 2-22.2 anlr syntax File Parsing... 2-32.3 anlr RULE (RULE) parsing... 2-44.1 install the anlr Studio plug-in... 4-74.2 features... 4-8


1. Introduction to anlrAnlr- A, formerly known as PCCTS, provides A recognizer that automatically constructs A custom language for languages including Java, C ++, and C # using syntax descriptions ), the framework of the compiler (parser) and interpreter (translator. Anlr can resolve the identification conflict through assertions. It supports Action and Return Value. What's better is, it can automatically generate a syntax tree based on input and display it visually (this will be demonstrated in the following example ). Therefore, computer language translation becomes a common task-before that, YACC/LEX seems too academic, and LL (k) although the efficiency of the Basic anlr is still a little insufficient, after the upgrade and modification in recent years, it makes anlr sufficient to cope with the vast majority of existing applications. Thanks to Dr. Terence Parr and his colleagues for their excellent work over the past decade, they have done a lot of basic work for the basics of compilation theory and the construction of language tools, it also directly led to the generation of Russian anlr.Nother TOol LAnguage REcognition 1.1 lexical analyzer (Lexer)The Lexical analyzer is also called the parser, Lexical analyser, and Tokenizer. A programming language generally consists of keywords and strictly defined Syntax structures. The final purpose of compilation is to translate High-level instructions of the programming language into instructions that can be executed by physical machines or virtual machines. This analyzer analyzes and quantifies the meaningless hidden streams and translates them into discrete character groups (that is, tokens), including keywords, identifiers, and symbols (symbols) and operators are used by the syntax analyzer. 1.2 Parser)The compiler is also called Syntactical analyser. When analyzing the Token stream, Lexer does not care about the syntax meaning of a single Token generated and its relationship with the context. This is the work of Parser. The syntax analyzer organizes the received Tokens and converts them to the sequence allowed by the target language syntax definition. Both Lexer and Parser are identifiers. Lexer is a character sequence identifier and Parser is a Token sequence identifier. They are similar in nature, but different in terms of division of labor. 1.3 anlrAnlr combines the two. It allows us to define the lexical rules for recognizing the Token stream and the lexical analysis rules for interpreting the Token stream. Then, anlr automatically generates the corresponding lexical/syntax analyzer based on the syntax file provided by the user. Users can use them to compile the input text and convert it into other forms (such as AST-Abstract Syntax Tree, Abstract Syntax Tree ). 2. Use of anlr 2.1 installation and useGo to http://www.antl.org/to download the latest version of antldevelopment package and source code (for example, version 2.7.5 ). Configure the directory where the antlr-2.7.5.jar is located in your environment variables and write the syntax file (such as SensorSQL. g), run the "java anlr. tool SensorSQL. g to obtain the automatically generated syntax/Lexical analyzer. 2.2 parsing of anlr syntax filesNext, we will perform some detailed analysis on the anlr syntax file described in the figure. For better use of anlr, You can also download the Eclipse plug-in to help you complete your work. 1. header field: all the parts that appear here will appear at the top of the Java file generated after anlr compilation. In this example, you can place the package name and other information in this area. The generated result is shown in the response code section. 2. The content you provide in this section is unique for each syntax in the file. The content of this area will appear before the actual class definition. That is to say, two imports only belong to the class CalcParser, not to other classes defined in the same file (such as CalcLexer) 3. here is the syntax definition part. You can also regard it as a class definition. 4. In the Option field, you can provide options for Your syntax. For example, whether to create a default abstract syntax tree and specify the value of parameter K in LL (k) (default: 1). For more detailed parameters, see the manual provided by anlr. 5. the Token part is used to declare the "imaginary" tokens that are not declared in the lexical analyzer. This information is usually used in TreeParser to specify the "imaginary" node. 6. this is another Action zone. anlr faithfully places the information in this zone into the class definition, which is equivalent to the class member method, it mainly provides users with a way to customize extensible methods in Parser. 2.3 anlr RULE (RULE) ParsingIn the syntax file of anlr, the definition of a rule corresponds to a Java source file generated by anlr. 1, 2, 3, 4: As you can see, We can do everything equivalent to a function in a rule definition. We can specify parameters for the rule (like the int a above), specify the return value (int c), and even throw an exception. From the right half we can clearly see that all content defined in the rule is faithfully and accurately translated to the corresponding location of the Java source file. 5: This optional part provides us with the ability to specify certain optional parameters. In this example, anlr is told not to generate the default error handling part during code generation, which is the responsibility of the user. 7: In the exception handling section, we can specify a custom exception handling method. Just print the error stack information. 3 anlr syntax example-SensorSQLSensorSQL is a customized simplified SQL language. Due to space limitations, the syntax definitions supported by SensorSQL are not listed in detail here. I just give an example of a query: normally, the purpose of compiling a query is to convert it into a form that can be understood by the queried device. There are two common methods. One is to write detailed syntax rules as mentioned in the previous section. After anlr generates the corresponding Java file, you can directly use the running result. There are many such examples, the most typical of which is the parsing of arithmetic expressions. For expressions such as 1 + 2-3*4/5 ^ 6, you only need to write the syntax rules to obtain the calculation result during the parsing process: first, anlr compiles it into a reverse Polish Structure -- (-(+ 1 2) (/(* 3 4) (^ 5 6); During the syntax tree generation process, synchronize the value of the calculated expression, that is, it is similar to the expression calculation seen in section 2.3. The result is as follows: however, in many cases, you may not know what method to use. Therefore, when writing and processing code, it is necessary to be limited by the code in the existing Parser/Lexer. Once you need to modify it, you need to re-compile the syntax file to generate new Java code, which is complicated. In addition, once the processing process is incorrect, it is necessary to debug and modify the generation generated by anlr repeatedly. The automatically generated code is not very structured, and debugging is also troublesome. Therefore, if the efficiency permits, there is no need for anlr to do additional work. simply concentrate on his syntax analysis, other jobs can be traversed or tossed after the syntax tree is generated. Is the result of the SensorSQL syntax analysis I just demonstrated. After this result is generated, I Need To package each syntax element into a byte sequence and send it to the sensor network. At this time, to ensure the priority of the Where statement, you can generate a structure similar to mine according to the chapter about generating the syntax tree in the anlr document, then, you only need to traverse the Where part of the syntax tree in the forward order to achieve the goal. As for the other part, just traverse it in order.


4 anlr StudioWith the foundation above, we can start our work. However, it is not impossible to use Notepad, Editplus + command line or simply write an ANT script. However, I always feel that this method is too primitive in the era of Integrated IDE, fortunately, Placid System provides us with an Eclipse plug-in that gives us the opportunity to get out of the primitive society. Http://www.placidsystems.com/. the latest version is 1.1.0. The only pity is that although the plug-in has excellent functions, it will charge fees. Otherwise, there will only be an 11-day trial. 4.1 install the anlr Studio plug-inInstall plug-ins in Eclipse. Note that you should remove the plug-ins from Placid plugin and put them in the ECLIPSE_DIR/plugins/antlstudio_x.x.x directory (here x. x. x is the version number, for example,-1.1.0 ). After the installation is successful, a lexical analyzer navigation button will appear on the Eclipse toolbar: When you right-click your project, you will find that you want to control whether to use the anlr Studio switch: after opening a grammar file, you can see the following interface: In the outline window on the right, all Parser and Lexer elements are listed, and Protected Token (such as Number) is displayed) it is different from other common tokens. On the left side, different areas are highlighted by different color blocks. 4.2 featuresAnlr Studio provides detailed document descriptions in Eclipse Help, so here I will only introduce some new functions of version 1.1.0. L fully supports anlr 2.7.6 and automatically upgrades the previous project to version 1.1.0. L Syntax dimo-view allows you to conveniently View the entered Syntax structure. L improved the Debug function to Debug large grammar files. Before that, if a syntax file is large, anlr Studio will throw an exception. L supports Automatic Code complementing and provides comprehensive prompts (as shown below) for an anlr document ). 4.2.1 Syntax di#view)In Window-> Show View-> Other, after you select to display this View, you can use this cool function to use this View, you can easily see the syntax structure of your defined syntax. For example, if my SELECT statement definition is as follows, you only need to place the cursor to any position in the selectStatement rule, we can see in Syntax dimo-view that the complete Syntax structure is clearly displayed in front of us. In this case, you only need to place the cursor over the delimiters (^) (Note: the delimiters are used to indicate that when the syntax tree is generated, the SubRule in which the delimiters are located is used as the root node of the tree or subtree.): The SubRule is highlighted in pink, if your cursor is placed in a Token, it will become pale blue, so cool.


4.2.2 enhanced Debug FunctionsTo enable or disable the Debug function of anlr Studio, follow these steps: l right-click the project to enable/cancel anlr Studiol in the project and open the anlr Studio tab in properties. L select/cancel 'enabable debugging in grammar files 'and then we will be able to use its Debug function. Just like debugging other Java files, we can insert breakpoints anywhere in the syntax file: when the program runs to the breakpoint, we can also use "Skip" like debugging common applications ", "continue" and other debugging methods for Java applications are very convenient and convenient.


5 conclusion

In this article, I have not introduced the various usage or language details of anlr. I think there are many Chinese/English documents on the Internet for reference, I am concerned with a major direction and its core content. For those who do not want to gain an in-depth understanding of anlr implementation principles or do not want to study its code, but want to make it play a role in their own projects as soon as possible, it is still enough, wish you a pleasant journey to anlr-J like me.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.