The process of recent projects is becoming clearer, but many of the key technologies are not mastered and can only be explored step-by-step.
Because of the static code analysis based on data flow analysis, the front-end work such as: lexical analysis, grammar analysis is essential. Yacc and Lex What no longer consider, check the information of the day, found two more suitable, one is the ANTLR under the Java, another is specialized do PHP ast generated php-parser.
ANTLR is a more famous tool in the field of compiler theory, and more practical than YACC and Lex. But the syntax file for PHP only one, toss a half-day to generate a tune, found not very suitable for the "$a = 1" generation tokens unexpectedly is [$,a,=,1], unable to identify assignment, do too rough, it is extremely disappointing.
By contrast, Php-parser more professional, after all, focus on the lexical, grammatical analysis of PHP work.
Php-parser's Project homepage is https://github.com/nikic/PHP-Parser. Multiple versions of PHP can be perfectly parsed to produce an abstract syntax tree.
For lexical analysis, PHP has a built-in function token_get_all () can be used to get tokens, as input to the parsing, this open source project is also used Token_get_all () generated token stream.
Installation is also very simple, here I am using the Package management tool in PHP Composer added, in the project directory to execute the following command:
PHP Composer.phar require Nikic/php-parser
If you do not download composer, you should first execute the following command:
Curl-s Http://getcomposer.org/installer | Php
3. Generate AST
After adding Php-parser with composer, you can use it easily.
First, introduce some of the node types defined in Php-parser:
(1) PHPPARSER\NODE\STMT is a statement node, with no return information (returns) of the structure, such as assignment statement "$a = $b";
(2) phpparser\node\expr is an expression node that can return the language structure of a value, such as $var and Func ().
(3) Phpparser\node\scalar is a constant node that can be used to represent any constant value. such as ' string ', 0, and constant expressions.
(4) Some nodes are not included, such as parameter nodes (PHPPARSER\NODE\ARG).
Some node classes have names that use underscores to avoid conflicts with PHP keywords.
The Php-parser HelloWorld program is as follows, which generates an AST:
The output results are:
<span style= "FONT-SIZE:12PX;"
>array ( => phpparser\node\stmt\echo_ Object ([subnodes:protected] => Array ([Exprs] => Array ( => Phpparser\no De\scalar\string Object ([subnodes:protected] => A
Rray ([value] => 1+2
) [attributes:protected] => Array
([StartLine] => 1
[EndLine] => 1))
 => phpparser\node\scalar\string Object ( [subnodes:protected] => Array ( [value] => chongrui) [Attributes: Protected] => Array ([startLine] =
> 1 [endLine] => 1)
)) [attributes:protected] => Array
([StartLine] => 1 [endLine] => 1)) ) </span>
As you can see, this lesson AST has only one node echo_, and this node has a child node Exprs that can be accessed using $STMTS->EXPRS.
The attributes information in the node is used to store startline and endline as well as comments. Access can be done using getattributes (), getattribute (' StartLine '), setattribute (), Hasattribute () method.
The start line number StartLine can be accessed through the getline ()/setline () method (or GetAttribute (' StartLine ')). Note Information can be obtained using getdoccomment ().
Access value on the node: such as access value "Chongrui", use $stmts->exprs->value;.
4, the node traversal
The traversal of the abstract syntax tree is very convenient, using the Phpparser\nodetraverser class. Also, a custom visitor object is supported. Because in the actual application, the PHP source code analysis, often do not know the specific structure of the AST, at this time need to dynamic to determine the type of each node information.
These judgments are uniformly written to Mynodevisitor, which inherits a parent class nodevisitorabstract, and there are some methods in this class:
(1) The Beforetraverse () method is used before traversal, and is usually used to reset the value before traversing.
(2) The Aftertraverse () method is the same as (1), and the only difference is that the traversal is triggered.
(3) the Enternode () and Leavenode () methods are triggered when accessing each node.
Enternode is triggered when it enters a node, for example, before accessing the node's child nodes. This method can return Nodetraverser::D Ont_traverser_children to skip the child node of the node.
Leavenode is triggered after the traversal node completes. It can return
Nodetraverser::remove_node, in this case, the current node will be deleted. If a collection of nodes is returned, the nodes are incorporated into the array of the parent node, such as Array (A,B,C), and B nodes are replaced by array (X,Y,Z) and become array (A,X,Y,Z,C).
The following code fragment parses the $code, generates an AST, and, when traversed, outputs the string type when traversing the node.
The results are output 1, 2.
5, other AST Express
Sometimes the AST is persisted as text, and this feature is supported by Php-parser.
(1) Simple serialization
The AST can be persisted by using serialize () and Unserialize () for serialization and deserialization operations.
(2) Easy to read save form
is the perfect print and XML persistent storage, which is not detailed here, when needed to see the project's Documentation:
At least in the static analysis of PHP, Php-parser is much better than ANTLR in terms of function. How to build a PHP automated audit system, this php-parser will certainly play a very important role: