85-syntax analysis for virtual machines

Source: Internet
Author: User
Tags php source code polish notation calculator reverse polish notation calculator

85-syntax analysis for virtual machines

Bison is a common-purpose parser generator. It transforms the description of the LALR (1) context-independent grammar into a C program that analyzes the grammar. It can be used to generate interpreter, compiler, protocol implementation and many other programs. Bison upward compatible with the YACC, all written and correct YACC syntax should be able to work without modification in bison. It is not only compatible with YACC but also has many characteristics that YACC does not possess.

The Bison parser file is a C code that defines a function named Yyparse and implements a syntax. This function is not a C program that can do all the parsing tasks. In addition to this we must provide additional functions such as the lexical parser, the error reporting function called when the parser reports an error, and so on. We know that a complete C program must begin with a function called Main, and if we are going to build an executable file and run the parser, then we need to have the main function and call Yyparse directly or indirectly somewhere, otherwise the parser will never run.

First look at the Bison example: Reverse Polish notation Calculator

%{#define Yystype double#include <stdio.h> #include <math.h> #include <ctype.h>int yylex (void); void Yyerror (char const *);%}%token num%%input:/* Empty * * | Input line, line: ' \ n ' | Exp ' \ n ' {printf ("\t%.10g\n", $ $);};           Exp:num {$$ = $; }       |      EXP Exp ' + ' {$$ = $ + $; }        |      EXP Exp '-' {$$ = $-$; }        |      EXP Exp ' * ' {$$ = $ * $; }        |      EXP Exp '/' {$$ = $/$; }/* exponentiation * * |        Exp Exp ' ^ ' {$$ = POW ($, $);} /* unary minus */|          Exp ' n ' {$$ =-$1;  };%% #include <ctype.h>int yylex (void) {int c;/* Skip white space.  */while ((c = GetChar ()) = = ' | | c = = ' \ t ');/* Process numbers.       */if (c = = '. ' | | isdigit (c)) {ungetc (c, stdin);       scanf ("%lf", &yylval);     return NUM;  }/* Return end-of-input.       */if (c = = EOF) return 0; /* Return a SIngle Char. */return C;} void Yyerror (char const *s) {fprintf (stderr, "%s\n", s);} int main (void) {return yyparse ();}

Let's look at the effect of the operation:

bison demo.ygcc -o test -lm test.tab.cchmod +x test./test

The GCC command needs to add the-LM parameter. Because the header file describes only the interface, the header file is not the entity responsible for symbolic parsing.  At this point you need to tell the compiler which library to use to complete the parsing of the symbol. In the GCC command parameter, the-l parameter is used to specify the library to which the program is linked, and the-l parameter is followed by the library name, where we are followed by M, the math library, whose library name is M, and his library filename is libm.so.

This is an example of a reverse Polish notation calculator, enter 3 7 + carriage return on the command line, output 10

In general, the process of using the Bison design language, from syntax descriptions to writing a compiler or interpreter, has three steps:

    • Formally describe the syntax in a bison-recognizable format. For each grammar rule, the action is executed by the C statement sequence when the rule is recognized. That is, the content of the percent and percent percent we see in the example.
    • Describes writing a lexical parser that processes input and passes tokens to the parser (that is, the Yylex function must exist). The lexical analyzer is either hand-written C code or Lex, and we'll talk about how to use RE2C with bison later. The above example is a lexical parser that directly writes C code to implement a command-line read content.
    • Write a control function that invokes the parser generated by bison, which is called directly by the main function in the example. Write the error reporting function (that is, the Yyerror function).

Converting these source code into executable programs requires the following steps:

    • Run Bison by syntax to generate the parser. The command in the corresponding example, Bison DEMO.Y
    • Compile the Bison output code like any other source code, linking the target file to produce the final product. That is, the command in the corresponding example Gcc-o test-lm TEST.TAB.C

We can divide the entire Bison grammar file into four parts. These three sections are divided by the percent ',%{' and '%} ' symbols. In general, the Bison syntax file structure is as follows:

%{这里可以用来定义在动作中使用类型和变量,或者使用预处理器命令在那里来定义宏, 或者使用#include包含需要的文件。如在示例中我们声明了YYSTYPE,包含了头文件math.h等,还声明了词法分析器yylex和错误打印程序yyerror。%}Bison 的一些声明在这里声明终结符和非终结符以及操作符的优先级和各种符号语义值的各种类型如示例中的%token NUM。我们在PHP的源码中可以看到更多的类型和符号声明,如%left,%right的使用%%在这里定义如何从每一个非终结符的部分构建其整体的语法规则。%%

Here to store additional content
It's more free here, you can put any code you want.
The functions that begin to declare, such as Yylex, are often implemented here, and our example is doing so.

We described earlier that PHP uses re2c as the lexical parser, so how does PHP integrate RE2C with bison? We illustrate the whole process with an example stripped out of the PHP source code. The functionality of this example is similar to the example in the previous section, and is useful for identifying string types in input parameters. This example adds a syntax parsing process based on it. First we look at the syntax file for this example: DEMO.Y

%{#include <stdio.h>#include "demo_scanner.h"extern int yylex(znode *zendlval);void yyerror(char const *);#define YYSTYPE znode   //关键点一,znode定义在demo_scanner.h   %}%pure_parser    //  关键点二%token T_BEGIN%token T_NUMBER%token T_LOWER_CHAR%token T_UPPER_CHAR %token T_EXIT%token T_UNKNOWN%token T_INPUT_ERROR%token T_END%token T_WHITESPACE%%begin: T_BEGIN {printf("begin:\ntoken=%d\n", $1.op_type);}     | begin variable {        printf("token=%d ", $2.op_type);        if ($2.constant.value.str.len > 0) {            printf("text=%s", $2.constant.value.str.val);        }        printf("\n");}variable: T_NUMBER {$$ = $1;}    |T_LOWER_CHAR {$$ = $1;}    |T_UPPER_CHAR {$$ = $1;}    |T_EXIT {$$ = $1;}    |T_UNKNOWN {$$ = $1;}    |T_INPUT_ERROR {$$ = $1;}    |T_END {$$ = $1;}    |T_WHITESPACE {$$ = $1;}%%void yyerror(char const *s) {    printf("%s\n", s);  }

This syntax file has two key points:

1.znode is a copy of PHP source Znode, but here we only keep two fields, the structure is as follows:

typedef union _zvalue_value {    long lval;                  /* long value */    double dval;                /* double value */    struct {        char *val;        int len;    } str;} zvalue_value;typedef struct _zval_struct {    /* Variable information */    zvalue_value value;     /* value */    int type;    /* active type */}zval;typedef struct _znode {    int op_type;    zval constant;}znode;

We've also copied PHP's zval structure here, but we've only taken the structure of integer, floating-point and string-type. Op_type is used to record the type of operation, constant records the data obtained by the analysis process. In general, it is sufficient to use the same data type for semantic values of all language constructs in a simple program. For example, the inverse Polish notation calculator in the previous section has only a double type. And bison, by default, uses the int type for all semantic values. If you want to indicate other types, you can define Yystype as a macro as in our example:

#define YYSTYPE znode

2.%pure_parser declares in Bison that%pure_parse indicates that you want to produce a reentrant (reentrant) parser. By default, the lexical parse function called Bison is named Yylex, and its argument is void, and if Yylex_param is defined, then Yylex_param is used as the parameter, which we can find in the Bison generated. c file that is implemented using #ifdef.

If%pure_parser is declared, the communication variables Yylval and yylloc become local variables in the Yyparse function, and the variable yynerrs becomes a local variable in Yyparse, and Yyparse's own invocation does not change. For example, in our example we declare a reentrant and use a change of type Zval as the first parameter of the Yylex function, in the generated. c file, we can see that the type of Yylval becomes

A reentrant (reentrant) program is a program that does not change during execution, in other words, it is all composed of pure (pure) (read-only) code. Reentrant features are important when asynchronous execution is possible. For example, calling a non-reentrant program from a handle may be unsafe. In a system with multithreaded control, a non-reentrant program must be called only by interlocking (interlocks).

By declaring reentrant functions and using the Znode parameter, we can record the values obtained during the analysis and the tokens generated by the lexical analysis process. The Yylex function is called during the Yyparse call, and the Yylex function in this example is generated by RE2C. The lexical rules are defined in the Demo_scanner.l file. Most of the rules are borrowed from the previous section of the example, on this basis we added a new Yylex function, and the zendlval as a communication variable, the lexical analysis process of strings and tokens are passed back. The additional actions associated with this are:

SCNG(yy_text) = YYCURSOR;   //  记录当前字符串所在位置/*!re2c  <!*> {yyleng = YYCURSOR - SCNG(yy_text);} //  记录字符串长度 

There have been some changes to the main function:

int main(int argc, char* argv[]){    BEGIN(INITIAL); //  全局初始化,需要放在scan调用之前    scanner_globals.yy_cursor = argv[1];    //将输入的第一个参数作为要解析的字符串    yyparse();    return 0;}

In the new main function, we have added a new call to the Yyparse function, which automatically calls the Yylex function during execution. If you need to run this program, you need to execute the following command:

re2c -o demo_scanner.c -c -t demo_scanner_def.h demo_scanner.lbison -d demo.ygcc -o t demo.tab.c demo_scanner.cchmod +x t./t "<?php tipi2011"

In front of us, a small example and a sample stripped from PHP source simply illustrate Bison's entry and the combination of bison and re2c. When we use the GDB tool to debug PHP's execution process, the process of compiling PHP code is as follows:

#0  lex_scan (zendlval=0xbfffccbc) at Zend/zend_language_scanner.c:841#1  0x082bab51 in zendlex (zendlval=0xbfffccb8)    at /home/martin/project/c/phpsrc/Zend/zend_compile.c:4930#2  0x082a43be in zendparse ()    at /home/martin/project/c/phpsrc/Zend/zend_language_parser.c:3280#3  0x082b040f in compile_file (file_handle=0xbffff2b0, type=8)    at Zend/zend_language_scanner.l:343#4  0x08186d15 in phar_compile_file (file_handle=0xbffff2b0, type=8)    at /home/martin/project/c/phpsrc/ext/phar/phar.c:3390#5  0x082d234f in zend_execute_scripts (type=8, retval=0x0, file_count=3)    at /home/martin/project/c/phpsrc/Zend/zend.c:1186#6  0x08281b70 in php_execute_script (primary_file=0xbffff2b0)    at /home/martin/project/c/phpsrc/main/main.c:2225#7  0x08351b97 in main (argc=4, argv=0xbffff424)    at /home/martin/project/c/phpsrc/sapi/cli/php_cli.c:1190

In the PHP source code, the lexical parser is ultimately called the Lex_scan function defined by the RE2C rule, and the function provided to Bison is zendlex. And Yyparse was replaced by Zendparse.

85-parsing of virtual machines

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.