PHP Parser: re2c && BISON Summary _php Tutorial

Source: Internet
Author: User
Before that, I tried a project that would automatically generate so extensions to our PHP code,

Compiled into PHP, I call it phptoc.

However, this item was suspended for various reasons.

Write this article because the information is too small, the second is to summarize their own harvest, in order to reference later, if you can understand the PHP syntax analysis

The PHP source of the study will be a better ^.^ ...

I write as easy as I can.

The idea stems from the HipHop of Facebook's Open source project.

In fact, I am skeptical about the performance of this project by 50%-60%, and fundamentally speaking, if PHP is used with APC cache, does it have low performance?

In hiphop, I have not done the test, dare not assert.

Phptoc, I just want to free the C programmer, hoping to achieve, so that phper with PHP code can write a close to the PHP extension performance of an extension,

Its flow is as follows, read PHP file, parse PHP code, parser it, generate corresponding ZENDAPI, compile to expand.

Go to the Chase

The most difficult here is the parser, we should all know that PHP also has its own parser, the version is now used RE2C and Bison.

So I naturally used this combination as well.

If you want to use PHP parser is not very realistic, because the need to modify ZEND_LANGUAGE_PARSER.Y and ZEND_LANGUAGE_SCANNER.L and recompile, this is difficult to say, it may also affect PHP itself.

So decided to rewrite a set of their own grammar analysis rules, this function is tantamount to rewriting PHP parser, of course, will discard some infrequently used.

Re2c && Yacc/bison, by referencing their own files, and then compiling them uniformly into a *.c file, and finally the GCC compilation will be born

into our own program. So they're fundamentally not a parser, they just generate a separate C-text for our rules.

, this c file is really the parser we need, and I prefer to call it a grammar generator. Such as:

Note: The A.C in the figure is the final code generated by the scanner.

RE2C scanner, if we write a scan rule file called SCANNER.L, it will write our PHP file content, scan, and then according to

We write the rules that generate different tokens to pass to the parse.

We wrote (f) Lex's grammatical rules, like we called him parse.y.

is compiled into a parse.tab.h,parse.tab.c file via Yacc/bison, and parse operates differently based on different tokens.

For example, our PHP code is "echo 1″;

The scan has one rule:

"Echo" {

return T_echo;
}
The scanner function scan gets the echo 1″ string, which loops through this piece of code, and if it finds an echo string, it returns Token:t_echo as a keyword.

Parse.y and SCANNER.L will generate two C files, scanner.c and PARSE.TAB.C, which are compiled together with GCC.

Here's a concrete word.

Interested in can go to see, I also translated a Chinese version,

It's over, I'll put it up later.

RE2C provides a number of macro interface, we use, I did a simple translation, English level is not good, may be wrong, need the original can go to the address above the view.

Interface code:
Unlike other scanner programs, RE2C does not generate a complete scanner: The user must provide some interface code. The user must define the following macro or other appropriate configuration.
Yycondtype
With-c mode you can use the-to parameter to generate a file that uses a condition that contains an enumeration type. Each value is used as a condition in the Rules collection.
Yyctype
Used to maintain an input symbol. It is usually char or unsigned char.
Yyctxmarker
An expression of type *yyctype, the context of the generated code backtracking information is saved in Yyctxmarker. If a scanner rule needs to use one or more regular expressions in the context, the user needs to define the macro.
Yycursor
An expression pointer to the *yyctype type points to the current input symbol, the generated code matches the symbol, and at the beginning, Yycursor assumes the first character of the current token. At the end, Yycursor will point to the first character of the next token.
Yydebug (state,current)
This is only required if you specify the-D identifier. It is very easy to debug the generated code when invoking a user-defined function.
This function should have the following signature: void Yydebug (int state,char current). The first parameter accepts state, and the default value is-1 The second parameter accepts the current position of the input.
Yyfill (N)
When the buffer needs to be populated, the generated code will call Yyfill (n): provide at least n characters. Yyfill (n) will adjust the yycursor,yylimit,yymarker and yyctxmarker as needed. Note that in a typical programming language, n equals the length of the longest keyword plus one. The user can specify the maximum length by defining the Yymaxfill at/*!max:re2c*/once. If -1,yymaxfill is used, the block will be called once after/*!re2c*/.
Yygetcondition ()
If the-c mode is used, the definition will get the set of conditions before the scanner code. This value must be initialized to the type of the enumeration Yycondtype.
Yygetstate ()
If the-f mode is specified, the user needs to define the macro. If so, at the beginning of the scanner in order to get the saved state, the generated code will call Yygetstate (), yygetstate () must return a signed integer, this value if 1, tells the scanner this is the first time, Otherwise this value is equal to the previously saved state of Yysetstate (s). Otherwise, the scanner will call Yyfill (n) immediately after the operation is resumed.
Yylimit
The type of the expression *yyctype the end of the tag buffer (Yylimit (-1) is the last character of the buffer). The generated code will constantly compare Yycorsur and yylimit to determine when the buffer is populated.
Yysetcondition (c)
This macro is used to set conditions in a transform rule, which is only useful when specifying the-C mode and using a translation rule.
Yysetstate (s)
The user only needs to define the macro when specifying the-f mode, and if so, the generated code will call Yysetstate (s) before Yyfill (n), and the Yysetstate parameter is a signed integer called the unique instance of Yyfill (n).
Yymarker
An expression of type *yyctype, the generated code saves the backtracking information to Yymarker. Some simple scanners may not be available.
The scanner, as the name implies, is scanning the file to find out the key code.

Scanner file structure:

/* #include file */
/* Macro Definition */
Scan function
int scan (char *p) {
/* Scanner Rule area */
}
Perform the scan scan function and return tokens to Yacc/bison.
int Yylex () {
int token;
Char *p=yycursor;//yycursor is a pointer to our PHP text content
while (Token=scan (p)) {//here will move the pointer p, one to determine whether we have defined above scanner ...
return token;
}
}
int main (int argc,char**argv) {
BEGIN (INITIAL);//
Yycursor=argv[1];//yycursor is a pointer to our PHP text content,
Yyparse ();
}
BEGIN is a defined macro

#define Yyctype char//type of input symbol
#define State (name) yyc# #name
#define BEGIN (N) yysetcondition (state (n))
#define LANG_SCNG (v) (SC_GLOBALS.V)
#define SCNG LANG_SCNG
#define Yygetcondition () scng (yy_state)
#define Yysetcondition (s) scng (yy_state) =s
The Yyparse function is defined in the YACC,

There is a key macro: Yylex

#define Yylex Yylex ()

It executes the yylex of the Scaner scanner.

May be a little bit around, re-wisp a wisp:

In Scanner.l, by calling the PARSE.Y parser function Yyparse, the function calls SCANNER.L Yylex to generate the critical code Token,yylex

Return the scanner to the

Token is returned to Parse.y,parse to execute different codes based on different tokens.

Example:

Scanner.l
#include "Scanner.h"
#include "Parse.tab.h"
int scan (char *p) {
/*!re2c
" BEGIN (st_in_scripting);
return t_open_tag;
}
"Echo" {

return T_echo;
}
[0-9]+ {
return t_lnumber;
}
*/
}
int Yylex () {
int C;

return t_string;
int token;
Char *p=yycursor;
while (Token=scan (p)) {
return token;
}
}

int main (int argc,char * * argv) {
BEGIN (INITIAL);//Initialize
yycursor=argv[1];//the user-entered string to Yycursor
Yyparse ();//yyparse ()-"Yylex ()-" yyparse ()
return 0;
}
So a simple scanner was made,

What about the parser?

Parser I use Flex and bison ...

About the file structure of Flex:

%{
/*
C Code Snippets are copied verbatim to the C source file generated by Lex after compilation
You can define some global variables, arrays, function routines, etc...
*/
#include
#include "Scanner.h"
extern int Yylex ();//It is defined in SCANNER.L:
void Yyerror (char *);
# define Yyparse_param Tsrm_ls
# define Yylex_param Tsrm_ls
%}
{Define segment, where token is defined}
This is where the key token program is based on the switch.
%token T_open_tag
%token T_echo
%token T_lnumber
%%
{Rule Segment}
Start
t_open_tag{printf ("start\n");}
|start statement
;
Statement
T_echo Expr {printf ("ECHO:%s\n", $ $)}
;
Expr
T_lnumber {$$=$1;}
%%
{User code Snippet}
void Yyerror (char *msg) {
printf ("error:%s\n", msg);
}
In the rule segment, start is where start is, and if scan recognizes the PHP start tag, it returns T_OPEN_TAG, then executes the parentheses code, outputting start.

In Scanner.l, the call to scan is a while loop, so it checks to the end of the PHP code,

Yyparse will switch from the tag returned by the scan and then goto the corresponding code, such as Yyparse.y found that the current token is T_open_tag,

It maps to the location of the PARSE.Y 21 rows, T_open_tag, and then executes the macro #line


So, what did token do after returning to Yyparse?

To be intuitive, I use GDB to track:

What is Yychar 258,258 at this time?

258 is the Bison automatically generated enumeration type data.

Go on

The Yytranslate macro accepts the Yychar and then returns the corresponding value

#define YYTRANSLATE (YYX) \
((unsigned int) (YYX) <= Yymaxutok Yytranslate[yyx]: Yyundeftok)

/* Yytranslate[yylex]--Bison symbol number corresponding to Yylex. */
static const YYTYPE_UINT8 yytranslate[] =
{
0, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 27, 2,
22, 23, 2, 2, 28, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 21,
2, 26, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 24, 2, 25, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20
};
Yyparse get this value, constantly translate,

Bison generates many arrays for mapping, saving the final translate to YYN,

So bison can find the code corresponding to the token.

Switch (YYN)
{
Case 2:

/* Line 1455 of YACC.C */
#line "Parse.y"
{printf ("start\n");;}
Break
In this way, the generation of the corresponding OP is saved in the hash table, which is not the focus of this paper, because it is constantly looping, generating tokens, and then parsing the corresponding Zend functions.

http://www.bkjia.com/PHPjc/477941.html www.bkjia.com true http://www.bkjia.com/PHPjc/477941.html techarticle before that, I tried a project that would automatically generate the so extension of our PHP code and compile it into PHP, I call it phptoc. But for various reasons, this project was suspended ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.