Parsing converter 3: handwritten PHP to the lexical part of the Python compiler

Source: Internet
Author: User
Tags php compiler
This article parsing converter 3: handwritten PHP to the lexical part of the Python compiler this article parsing converter 3: handwritten PHP to the lexical part of the Python compiler

The trick is to turn the entire PHP program into Python. Unlike templates, regular expression matching can be used to get lazy. this time it is not necessary to write a Php compiler.

I searched the internet and found that most of the transpile of Python to xxx is directly based on AST, omitting the most important Tokenizer and Parser. Directly write a Visitor. Otherwise, it may seem annoying to have a lot of code built on generators such as anlr.

Since everyone does not want to do this, I will try to manually write a Php compiler. It is implemented in three parts: Tokenizer, Parser, and Visitor.

I read longshu and Hu Shu for reference. I learned PHP carefully and didn't know it. it turned out that PHP had so many features and it was really tiring to be a compiler.

The lexical part is very simple. it is an automatic machine. The design of a structure to store the automatic machine, and then simply and roughly program on the automatic machine, does not care about any performance, it is a hammer sales.

Writing is still fast, debugging is not very smooth, but I won't say it, haha

The automatic mechanism is not complex. let's check it out. please correct me.


self.statemachine = {            'current': {                'state': 'default', 'content': '', 'line': 0},            'default': [                {'name': 'open', 'next': 'php', 'extra': 0, 'start': 0, 'end': 0, 'cache': '',                 'token': r'<\?'},                {'name': 'open', 'next': 'php', 'extra': 0, 'start': 0, 'end': 0, 'cache': '',                 'token': r'<\?php'}],            'php': [                {'name': 'close', 'next': 'default', 'extra': 0,                 'token': r'\?>', 'start': 0, 'end': 0, 'cache': ''},                {'name': 'lnum', 'next': '', 'extra': 0, 'start': 0, 'end': 0, 'cache': '',                 'token': r'[0-9]+'},                {'name': 'dnum', 'next': '', 'extra': 0, 'start': 0, 'end': 0, 'cache': '',                 'token': r'([0-9]*\.[0-9]+)|([0-9]+\.[0-9]*)'},                {'name': 'exponent', 'next': '', 'extra': 0, 'start': 0, 'end': 0, 'cache': '',                 'token': r'(([0-9]+|([0-9]*\.[0-9]+)|([0-9]+\.[0-9]*))[eE][+-]?[0-9]+)'},                {'name': 'hnum', 'next': '', 'extra': 0, 'start': 0, 'end': 0, 'cache': '',                 'token': r'0x[0-9a-fA-F]+'},                {'name': 'bnum', 'next': '', 'extra': 0, 'start': 0, 'end': 0, 'cache': '',                 'token': r'0b[01]+'},                {'name': 'label', 'next': '', 'extra': 0, 'start': 0, 'end': 0, 'cache': '',                 'token': r'[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*'},                {'name': 'comment', 'next': 'commentline', 'extra': 1,                 'token': r'//', 'start': 0, 'end': 0, 'cache': ''},                {'name': 'comment', 'next': 'commentline', 'extra': 1,                 'token': r'#', 'start': 0, 'end': 0, 'cache': ''},                {'name': 'comment', 'next': 'comment', 'extra': 1,                 'token': r'/\*', 'start': 0, 'end': 0, 'cache': ''},                {'name': 'string', 'next': 'string1', 'extra': 1,                 'token': r'\'', 'start': 0, 'end': 0, 'cache': ''},                {'name': 'string', 'next': 'string2', 'extra': 1,                 'token': r'"', 'start': 0, 'end': 0, 'cache': ''},                {'name': 'symbol', 'next': '', 'extra': 0, 'start': 0, 'end': 0, 'cache': '',                 'token': r'[\\\{\};:,\.\[\]\(\)\|\^&\+-/\*=%!~$<>\?@]'}],            'string1': [                {'name': 'string', 'next': 'php', 'extra': 0,                 'token': r'\'', 'start': 0, 'end': 0, 'cache': ''},                {'name': 'string', 'next': 'escape1', 'extra': 1,                 'token': r'\\', 'start': 0, 'end': 0, 'cache': ''},                {'name': 'string', 'next': '', 'extra': 1,                 'token': r'', 'start': 0, 'end': 0, 'cache': ''}],            'escape1': [                {'name': 'string', 'next': 'string1', 'extra': 1,                 'token': r'.', 'start': 0, 'end': 0, 'cache': ''}],            'string2': [                {'name': 'string', 'next': 'php', 'extra': 0,                 'token': r'\'', 'start': 0, 'end': 0, 'cache': ''},                {'name': 'string', 'next': 'escape2', 'extra': 1,                 'token': r'\\', 'start': 0, 'end': 0, 'cache': ''},                {'name': 'string', 'next': '', 'extra': 1,                 'token': r'', 'start': 0, 'end': 0, 'cache': ''}],            'escape2': [                {'name': 'string', 'next': 'string2', 'extra': 1,                 'token': r'.', 'start': 0, 'end': 0, 'cache': ''}],            'commentline': [                {'name': 'comment', 'next': 'php', 'extra': 0,                 'token': r'(\r|\n|\r\n)', 'start': 0, 'end': 0, 'cache': ''},                {'name': 'comment', 'next': 'php', 'extra': 0,                 'token': r'', 'start': 0, 'end': 0, 'cache': ''}],            'comment': [                {'name': 'comment', 'next': 'php', 'extra': 0,                 'token': r'\*/', 'start': 0, 'end': 0, 'cache': ''},                {'name': 'comment', 'next': '', 'extra': 1,                 'token': r'', 'start': 0, 'end': 0, 'cache': ''}]}

The above is the parsing converter 3: handwritten PHP to the details of the lexical part of the Python compiler, please pay attention to other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.