This article resolves Converter 3: handwritten php to python compiler lexical section
For a moment kungfu, I naturally want to get a big guy and turn the whole PHP program into Python. No more than templates, you can use regular matching lazy, this time not write a PHP compiler is not.
Internet search, found that most of the python to xxx transpile are directly based on the AST, omitting the most important tokenizer,parser. Write a visitor directly. Otherwise it is based on a generator like ANTLR, to get a lot of code, look annoying.
Since we don't want to be a laborer, I'll try it and write a PHP compiler manually. Divided into three parts to achieve a tokenizer,parser,visitor.
"Dragon book" "Tiger Book" as a reference, carefully learned a back to PHP, do not learn not to know, the original PHP has so many features, do a compiler really tiring.
Lexical part is very simple, is an automaton. The design of a structure to store automata, and then simply rough in the robot programming, also forget what performance, is a affair.
The writing is very fast, debugging is not very smooth, but I will not say, ha
The self-motive is not complicated, send up everybody to see, please correct me.
Self.statemachine = {' current ': {' state ': ' Default ', ' content ': ', ' line ': 0}, ' de Fault ': [{' Name ': ' Open ', ' Next ': ' php ', ' Extra ': 0, ' start ': 0, ' End ': 0, ' cache ': ', ' t Oken ': R ' <\? '}, {' Name ': ' Open ', ' Next ': ' php ', ' Extra ': 0, ' start ': 0, ' End ': 0, ' cache ': ', ' token ': R ' <\?php '}], ' php ': [{' name ': ' Close ', ' Next ': ' Default ', ' Extra ': 0, ' token ': R ' \?> ', ' Start ': 0, ' End ': 0, ' cache ': '}, {' name ': ' Lnum ', ' Next ': ', ' Extra ': 0, ' Start ': 0, ' End ': 0, ' cache ': ', ' token ': R ' [0-9]+ '}, {' name ': ' Dnum ', ' Next ': ', ' extra ': 0, ' start ': 0, ' End ': 0, ' cache ': ', ' token ': R ' ([0-9]*\.[ 0-9]+) | ([0-9]+\. [0-9]*) '}, {' name ': ' Exponent ', ' next ': ', ' Extra ': 0, ' start ': 0, ' End ': 0, ' cache ': ', ' token ': R ' ([0-9]+| ( [0-9]*\. [0-9]+) | ([0-9]+\. [0-9]*)) [ee][+-]? [0-9]+) '}, {' name ': ' Hnum ', ' Next ': ', ' Extra ': 0, ' start ': 0, ' End ': 0, ' cache ': ', ' Tok En ': R ' 0x[0-9a-fa-f]+ '}, {' name ': ' Bnum ', ' Next ': ', ' Extra ': 0, ' start ': 0, ' End ': 0, ' cache ': ', ' token ': R ' 0b[01]+ '}, {' name ': ' label ', ' Next ': ', ' Extra ': 0, ' start ': 0, ' End ': 0, ' cache ': ' , ' token ': R ' [a-za-z_\x7f-\xff][a-za-z0-9_\x7f-\xff]* '}, {' name ': ' Comment ', ' next ': ' Comme Ntline ', ' Extra ': 1, ' token ': R '//', ' Start ': 0, ' End ': 0, ' cache ': '}, {' name ': ' Comment ' , ' Next ': ' Commentline ', ' Extra ': 1, ' token ': R ' # ', ' Start ': 0, ' End ': 0, ' cache ': '}, {' N Ame ': ' comment ', ' next ': ' comment ', ' extra ': 1, ' token ': R '/\* ', ' Start ': 0, ' End ': 0, ' cache ': '}, {' Name ': ' String ', ' Next ': ' String1 ', ' extra ': 1, ' token ': R ' \ ', ' Start ': 0, ' End ': 0, ' cache ': ''}, {' Name ': ' String ', ' Next ': ' string2 ', ' Extra ': 1, ' token ': R ' "', ' Start ': 0, ' End ': 0, ' cache ': ' }, {' name ': ' symbol ', ' Next ': ', ' Extra ': 0, ' start ': 0, ' End ': 0, ' cache ': ', ' token ': R ' [\\\{\};:,\.\[\]\ (\) \|\^&\+-/\*=%!~$<>\?@] '}], ' string1 ': [{' Name ': ' String ', ' next ' : ' php ', ' Extra ': 0, ' token ': R ' \ ', ' Start ': 0, ' End ': 0, ' cache ': '}, {' Name ': ' String ', ' Next ': ' Escape1 ', ' Extra ': 1, ' token ': R ' \ \ ', ' Start ': 0, ' End ': 0, ' cache ': '}, {' Name ' : ' String ', ' Next ': ', ' Extra ': 1, ' token ': ' R ', ' Start ': 0, ' End ': 0, ' cache ': '} ', ' Escape1 ': [{' Name ': ' String ', ' Next ': ' String1 ', ' extra ': 1, ' token ': R '. ', ' start ': 0, ' End ': 0, ' Cache ': ' '} ', ' string2 ': [{' Name ': ' String ', ' Next ': ' php ', ' Extra ': 0, ' toke N ': R ' \ ', ' Start ': 0, 'End ': 0, ' cache ': '}, {' Name ': ' String ', ' Next ': ' Escape2 ', ' Extra ': 1, ' token ': R ' \ \ ', ' s Tart ': 0, ' End ': 0, ' cache ': '}, {' Name ': ' String ', ' Next ': ', ' Extra ': 1, ' token ': R ', ' Start ': 0, ' End ': 0, ' cache ': '} ', ' escape2 ': [{' Name ': ' String ', ' Next ': ' string2 ', ' extra ': 1, ' token ': R '. ', ' start ': 0, ' End ': 0, ' cache ': '} ', ' Commentline ': [{' Name '] : ' Comment ', ' Next ': ' php ', ' Extra ': 0, ' token ': R ' (\r|\n|\r\n) ', ' Start ': 0, ' End ': 0, ' cache ': '}, {' name ': ' Comment ', ' Next ': ' php ', ' Extra ': 0, ' token ': R ', ' Start ': 0, ' End ': 0, ' cache ': '} ], ' comment ': [{' name ': ' Comment ', ' Next ': ' php ', ' Extra ': 0, ' token ': R ' \*/', ' Start ': 0, ' End ': 0, ' cache ': '}, {' name ': ' Comment ', ' Next ': ', ' Extra ': 1, ' token ': R ' ', ' Start ': 0, ' End ': 0, ' Cache ': '} '}