0x00
Because there is nothing to update the blog, I will do the things summarized, as a blog, mainly to talk about the project in the use of some technology. Currently there are a lot of PHP automated audit tools, open source has rips, Pixy, commercial version of the fortify. Rips now only the first edition, because it does not support the object-oriented analysis of PHP, so now see the effect is not too ideal. Pixy is a tool based on data flow analysis, but supports only PHP4. And Fortify is a commercial version, because of this limitation, the study of it can not be discussed. The domestic research on the automatic audit of PHP is generally done by the company, at present some tools most use simple token flow analysis or direct rough some, use regular expression to match, the effect will be very general.
0x01
Today's technology is based on static analysis of a PHP automated audit of the implementation of ideas, but also my project ideas. For more efficient variable and stain analysis, and good response to various flexible syntax representations in PHP scripts, the regular expression effect is definitely not ideal, and the idea I'm introducing is based on the code static analysis technology and the Data Flow analysis technology audit.
First, I think an effective audit tool includes at least the following modules:
1. Compile front-end Module
The compilation of the front-end module mainly uses the abstract syntax tree in the compiling technology, the control flow diagram constructs the method, turns the source file into the form suitable for the back-end static analysis.
2. Global Information Collection module
This module is mainly used for the unified information collection of the analysis source files, for example, to collect the definition of how many classes are in the audit project, and to collect the start and end line numbers of the method names, parameters, and method-defined code blocks in the class to speed up subsequent static analysis.
3. Data Flow Analysis Module
This module is different from the Data Flow analysis algorithm in compiling technology, and it pays more attention to the processing of PHP language itself in the project. When the call of the sensitive function is found in the process of the system and in the process, the sensitive parameters in the function are analyzed, that is, the specific changes of the variables are tracked to prepare for the subsequent stain analysis.
4, the Vulnerability Code Analysis module
This module is based on data Flow Analysis module collection of global variables, assignment statements and other information, the stain data analysis. This paper mainly aims at the risk parameters in the sensitive sink, such as the first parameter in the Mysql_query function, and obtains the corresponding data flow information by backtracking, if the parameter has the user control indication in the backtracking process, it is recorded. If the dangerous parameter has the corresponding coding, the purification operation also must carry on the record. The stain analysis is done by tracking and analyzing the data of the hazard parameters. 0x02
With the module, how to carry out an effective process to automate the audit, I use the following process:
The approximate process of the analysis system is as follows:
1. Frame initialization
First of all, the initialization of the analysis framework is to collect information about all the user-defined classes in the source project to be analyzed, including class name, Class attribute, class method name, and file path of the class.
These are stored in the global context class, which is designed using a single example pattern and resides in memory for subsequent analysis.
2. Judge Main File
Second, determine if each PHP file is main file. In the PHP language, there is no so-called main function, most of the PHP file in the Web is divided into two types of call and definition, the definition of the type of PHP file is used to define a number of business classes, tool classes, tool functions, etc., do not provide access to the user, but to call the type of PHP file to invoke and the real processing of user requests is to invoke the type of PHP files, such as global index.php files. Static analysis is primarily a PHP file that handles the invocation type of a user request, that is, main file. The basis of judgment is:
Based on the completion of the AST parsing, determine whether the class definition in a php file, the number of lines of code defined by the method, is more than a range of all lines of code in the file, or, if so, a PHP file that is defined as a type, otherwise, main file is added to the list of file names to be analyzed.
3. The construction of AST abstract syntax tree
This project is based on the PHP language itself development, for its AST construction, we refer to the current more excellent PHP ast built implementation ———— PHP Parser.
The open source project is based on the PHP language itself, you can parse most of the structure of PHP, such as if, while, switch, array declaration, method invocation, global variables and other grammatical structures. Can be a good completion of this project to compile the front-end processing part of the work.
4, CFG flow diagram construction
Use the Cfgbuilder method in the Cfggenerator class. The method is defined as follows:
The concrete idea is to use recursion to build cfg. First, enter the nodes set that traverses the AST, and iterate over the elements (node) in the collection, such as whether it is a branch, a jump, an end, and a CFG build according to the node type.
In this case, the jump condition (conditions) of the branch statement and the loop statement is stored on the Edge (edge) of the CFG to facilitate data flow analysis.
5, the collection of data flow information
For a block of code, the most effective and worthwhile collection of information is an assignment statement, a function call, a constant (const define), a registered variable (extract parse_str).
The function of the assignment statement is to follow the variable tracking, in the implementation, I use a structure to represent the value of the assignment and location. The other data information is based on the AST to identify and obtain. For example, in a function call, determine whether the variable is escaped, encoded, and so on, or whether the called function is sink (such as mysql_query).
6, variable purification, coding information processing
$clearsql = Addslashes ($sql);
Assignment statements, when the right side is the filter function (user-defined filter function or built-in filter function), then the return value of the calling function is purified, that is, the $clearsql's purifying label plus addslashes.
A function call is found to determine whether the function name is a security function configured in the configuration file.
If so, the purge label is added to the location symbol.
7. Inter-process Analysis
If in the audit, discovers the user function The call, this time must carry on the process analysis, in the analysis project localization to the concrete method code block, takes into the variable carries on the analysis.
The difficulty is how to do variable backtracking, how to deal with the same name in different files, how to support invocation analysis of class methods, how to save user-defined sink (for example, call exec function in myexec, if no effective purification, then myexec as a dangerous function), How to categorize user-defined sink (such as Sqli XSS XPath, and so on).
The processing process is as follows:
8, stain analysis
With the above process, the final thing to do is stain analysis, mainly for the system built in a number of risk functions, such as the possible cause of XSS echo. It is also useful to analyze the risk parameters of dangerous functions, such as whether to make effective purification (such as escape, regular match, etc.), and to develop algorithms to backtrack the previous assignment or other transformations of the variable. This is undoubtedly a test of the engineering capability of the safety researcher and the most important stage of the automated audit.
0x03
With the above introduction, you can see that there are a lot of pits to implement your own automated audit tool. My attempt is also encountered n many difficulties, and static analysis does have some limitations, such as dynamic analysis can easily be obtained in the string transformation process, in the static analysis is difficult to achieve, this is not technically able to break through, but the static analysis itself caused by the limitations, So simple static analysis if you want to do false positives and under-reporting is very low, after all, introduce some dynamic ideas, such as the Eval code in the simulation, the string change function and regular expression processing. There are some MVC framework based, such as the CI Framework, the code is very dispersed, such as data purification code in the extension of the input class, like this PHP application, I think it is difficult to achieve a common audit framework, should be treated alone.
The above is only rough to my current attempt (not fully implemented) to share, after all, the university dog is not a professional, hope to be a guide, so that more and more security researchers pay attention to this area.