Source: www. phppan. com201109php-lexical-re2c Author: fat re2c is a scanner production tool that can create very fast and flexible scanners. It can generate efficient code. Based on the C language, it can support CC code. Unlike scanners of other classes, it focuses on generating efficient code for regular expressions (
Source: http://www.phppan.com/2011/09/php-lexical-re2c/ Author: fat re2c is a scanner production tool that can create very fast and flexible scanners. It can generate efficient code. Based on the C language, it can support C/C code. Unlike scanners of other classes, it focuses on generating efficient code for regular expressions (
Source: http://www.phppan.com/2011/09/php-lexical-re2c/
Author: fat
Re2c is a scanner creation tool that allows you to create extremely fast and flexible scanners. It can generate efficient code. Based on the C language, it can support C/C ++ code. Unlike other similar scanners, it focuses on generating efficient code for regular expressions (the same as its name ). Therefore, this is more widely used than traditional lexical analyzer. You can obtain the source code in sourceforge.net.
PHP used flex in the first lexical parser, and later PHP used re2c. The Zend/zend_language_scanner.l file in the source code directory is the re2c rule file. to modify the rule file, you must install re2c to re-compile it.
Re2c call method:
re2c [-bdefFghisuvVw1] [-o output] [-c [-t header]] file
Let's look at re2c through a simple example. The following is a simple scanner that determines whether the given string is a number, lowercase letter, or letter of size. Of course, there are no input error judgment and other exception operations. Example:
#include
char *scan(char *p){#define YYCTYPE char#define YYCURSOR p#define YYLIMIT p#define YYMARKER q#define YYFILL(n) /*!re2c [0-9]+ {return "number";} [a-z]+ {return "lower";} [A-Z]+ {return "upper";} [^] {return "unkown";} */}int main(int argc, char* argv[]){ printf("%s\n", scan(argv[1])); return 0;}
If you are in ubuntu, you can run the following command to generate an executable file.
re2c -o a.c a.lgcc a.c -o achmod +x a./a 1000
In this case, the program outputs the number.
Let's explain the macros of the re2c conventions we use.
- YYCTYPE is used to save the type of the input symbol, which is usually char or unsigned char.
- YYCURSOR points to the current input tag.-When it starts, it points to the first character of the current tag. when it ends, it points to the first character of the next tag.
- YYFILL (n) when the generated code needs to reload the cached tag, YYFILL (n) is called ).
- For the last character cached by YYLIMIT, the generated code compares YYCURSOR and YYLIMIT repeatedly to determine whether to re-fill the buffer.
Refer to the descriptions of the above several identifiers to clearly understand the generated. in the c file, of course, re2c won't only show the mark shown in the above Code. This is just a simple example. For more information about the mark and help information, see re2c help documentation: http://re2c.org/manual.html.
More Compiler-related Algorithms: Compiler Algorithms