Using Php-parser to generate an AST abstract syntax tree

Source: Internet
Author: User
Tags nikic scalar
0. Preface

The process of recent projects is becoming clearer, but many of the key technologies are not mastered and can only be explored in one step.

In order to do static code analysis based on data stream analysis, the work of front-end such as lexical analysis and grammatical analysis is necessary. Yacc and Lex are no longer considered, check the information of the day, found two more suitable, one is Java under the ANTLR, the other is specifically to do PHP ast generation Php-parser.

ANTLR is a relatively well-known tool in the field of compiling principles, and is more practical than YACC and Lex. But the PHP syntax file only one, toss a half-day to generate tuning, found not quite suitable for "$a = 1" generated tokens unexpectedly is [$,a,=,1], unable to identify assignment, do too rough, it is extremely disappointing.

In contrast, Php-parser more professional, after all, focus on the lexical, grammatical analysis of PHP work.

1. Introduction

Php-parser's Project homepage is https://github.com/nikic/PHP-Parser. Multiple versions of PHP can be parsed perfectly, creating an abstract syntax tree.

For lexical analysis, PHP has a built-in function token_get_all () that can be used to get tokens, as input to the parsing, and this open source project is also used by the Token_get_all () generated token stream.

2. Installation

Installation is also very simple, here I am using the PHP package management tool Composer added, in the project directory to execute the following command:

PHP Composer.phar require Nikic/php-parser

If you do not download composer, you should first execute the following command:

Curl-s Http://getcomposer.org/installer | Php

3. Generate AST

After adding Php-parser with composer, it is easy to use.

Let's start by introducing some of the node types defined in Php-parser:

(1) PHPPARSER\NODE\STMT is a statement node with no return information (returns) structure, such as an assignment statement "$a = $b";

(2) phpparser\node\expr is an expression node that can return a value of the language structure, such as $var and Func ().

(3) Phpparser\node\scalar is a constant node that can be used to represent any constant value. such as ' string ', 0, and constant expression.

(4) Some nodes are not included, such as parameter nodes (PHPPARSER\NODE\ARG).

The names of some node classes are underlined to avoid conflicts with PHP keywords.

Php-parser's HelloWorld program is as follows, and the snippet generates an AST:

The output is:


Array ([0] = Phpparser\node\stmt\echo_ Object ([subnodes:protected] = = Array ( [Exprs] = = Array ([0] = phpparser\node\scalar\st                                        Ring Object ([subnodes:protected] = = Array                                        ([value] = 1+2                                             ) [attributes:protected] = = Array (  [StartLine] = 1 [EndLine] = 1)) [1] = Phpparse  R\node\scalar\string Object ([subnodes:protected] =                  Array                      ([value] = Chongrui                                            ) [attributes:protected] = = Array (                                        [StartLine] [1] [endLine] = 1            )                                )                        )                ) [attributes:protected] = = Array ([startLine] = 1 [endlin E] = 1)))

As you can see, this lesson AST has only one node Echo_, this node has a child node Exprs, which can be accessed using $STMTS[0]->EXPRS.

The attributes information in the node is used to store startline and endline as well as comments. Access can be accessed using getattributes (), getattribute (' StartLine '), SetAttribute (), and Hasattribute () methods.

The start line number StartLine can be accessed through the getline ()/setline () method (also GetAttribute (' StartLine ')). Note Information can be obtained using getdoccomment ().

Access the value on the node: such as access value "Chongrui", use $stmts[0]->exprs[1]->value;.

4. Node traversal

The traversal of the abstract syntax tree is very convenient, using the Phpparser\nodetraverser class. At the same time, the custom visitor object is supported. Because in the actual application, the PHP source code analysis, often do not know the specific structure of the AST, it is necessary to dynamically determine the type of each node information.

These judgments are uniformly written in Mynodevisitor, which inherits a parent class Nodevisitorabstract, which has several methods:

(1) The Beforetraverse () method is typically used to reset a value before traversal before traversing.

(2) The Aftertraverse () method is the same as (1), and the only difference is that it is triggered after the traversal.

(3) the Enternode () and Leavenode () methods are triggered when each node is accessed.

The enternode is triggered when the node is entered, such as before the child node of the node is accessed. This method can return Nodetraverser::D Ont_traverser_children, which is used to skip the child node of the node.

The leavenode is triggered after the traversal of the node is complete. It can return

Nodetraverser::remove_node, in this case, the current node is deleted. If a collection of nodes is returned, the nodes are merged into the parent node's array, such as Array (A,B,C), and the B node is replaced by an array (x, Y, z) and becomes an array (a,x,y,z,c).

The following code fragment parses $code, generates an AST, and, when traversed, outputs when it discovers the string type when traversing a node.

The result is output 1, 2.

5, other AST said

Sometimes the AST is persisted in textual format, which is also supported by the Php-parser feature.

(1) Simple serialization

You can persist the AST by using serialize () and Unserialize () for serialization and deserialization operations.

(2) Easy-to-read form of preservation

They are perfect for printing and XML persistent storage, not detailed here, and you can look at the project's documentation when you need it:

Https://github.com/nikic/PHP-Parser/blob/master/doc/3_Other_node_tree_representations.markdown

6. Summary

At least in the static analysis of PHP, Php-parser is much better than ANTLR in terms of functionality. How to build a PHP automated audit system, this php-parser will certainly play a role:) ~

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.