On the technology of PHP Automation code auditing

Source: Internet
Author: User

 0x00

Because there is nothing to update the blog, I will do the current things to summarize, as a blog, mainly to talk about some of the technology used in the project. At present, there are manyPHPof automated audit tools, open-sourceRIPS,Pixy, the commercial version hasFortify. RIPSnow only the first version, because it is not supportedPHPObject-oriented analysis, so the effect is not ideal now. Pixyis a tool based on data flow analysis, but only supportsPHP4. andFortifyis a commercial version, because of this limitation, the study of it will be impossible to talk about. Domestic forPHPThe research of automatic auditing is usually done by the company, and some tools are mostly used in simpletokenstream analysis or directly rude, using regular expressions to match, the effect will be very general.

0x01

Today's technology is based on static analysis of a PHP Automated audit implementation of the idea, but also my project ideas. The regular expression effect is certainly not ideal for more efficient variable-based and stain analysis, as well as a good response to various flexible syntax representations in PHP scripts, and the approach I've introduced is based on the auditing of code static analysis techniques and data flow analysis techniques.

First, I think an effective audit tool contains at least the following modules:

1. Compile the front-end module

The compiler front-end module mainly uses the abstract syntax tree construction, the control flow graph construction method in the compilation technique, turns the source file into the form which is suitable for the back-end static analysis.

2. Global Information Collection Module

The module is mainly used for the analysis of the source files for unified information collection, such as the collection of the audit project how many definitions of the class, and the method name in the class, parameters, as well as the method definition code block start and end of the line number to be used to speed up the subsequent static analysis of the speed.

3. Data stream Analysis Module

This module is different from the data stream analysis algorithm in the compiling technology, and pays more attention to the processing of the PHP language itself in the project. When the call of the sensitive function is found in the process of the system and in-process analysis, the sensitive parameters in the function are analyzed by the data stream, which is to trace the specific change of the variable and prepare for the subsequent stain analysis.

4. Vulnerability Code Analysis Module

This module is based on the information of global variables and assignment statements collected by the Data Flow Analysis module, and analyzes the spot data. Mainly for The dangerous parameters in sensitive sink, such as the first parameter in the Mysql_query function, through backtracking to obtain the corresponding data flow information, if in the backtracking process found that the parameter has user control signs, To record it. If the danger parameter has the corresponding coding, the purification operation also must record. The stain analysis is done by tracking and analyzing the data of the dangerous parameters.

0x02

With modules, how to implement an effective process for automating audits, I use the following process:

The approximate flow of the analysis system is as follows:

1. Frame initialization

First, the initial work of the analysis framework is to collect information about all the user-defined classes in the source code project, including the class name, class attribute, class method name, and the file path where the class resides.

The Record is stored in the global context class context , which is designed using a singleton pattern and resides in memory for subsequent analysis purposes.

2. Judge the Main File

Second, judging eachPHPwhether the file isMain File. In thePHPlanguage, there is no so-calledMainfunctions, mostWebin thePHPThe file is divided into invocations and definitions of two types, which define the type ofPHPfiles are used to define business classes, tool classes, tool functions, etc., and are not provided to the user for access, but rather to the invocation type.PHPfile to be called. The actual processing of the user request is the invocation type of thePHPfiles, such as globalindex.phpfile. Static analysis is primarily for the types of calls that process user requestsPHPfiles, i.e.Main File. The judgment is based on:

based on the completion of the AST parsing, it is judged that the class definition in a PHP file, the number of code lines in the method definition is more than one range for all lines of code in the file, and if so, the PHP file, or Mainfile, to be added to the list of file names to be analyzed.

3. Construction ofAST abstract Syntax tree

This project is based on the PHP language itself, for the construction of its AST , we refer to the current excellent PHP ast Build Implementation ———— PHP Parser.

The open source project is developed based on the PHP language itself and can be used for most of the structure of PHP such as if,while, switch, array declaration, method invocation, global variables, and other syntax structures for parsing. Can be very good to complete the compilation of this project front-end processing part of the work.

4.CFG flow Diagram Construction

Use the cfgbuilder method in the Cfggenerator class . The method is defined as follows:

The concrete idea is to construct . First enter traverse ast get nodes collection, traversing the elements in the collection ( node node cfg

Here, for the branch statement, the jump condition of the loop statement (conditions) is stored to the Edge (edge) in the CFG , which facilitates data flow analysis.

5. Collection of data flow information

const define extract parse_str ).

The function of the assignment statement is to follow the variable tracking, in the implementation, I used a structure to represent the value of the assignment and location. The other data information is based on the AST to distinguish and obtain. For example, in a function call, determine whether the variable is escaped, encoded, and so on, or whether the function being called is sink(such as mysql_query).

6, variable purification, coding information processing

$clearsql = Addslashes ($sql);

Assignment statements, when the right is the filter function (user-defined filter function or built-in filter function), the call function return value is purified, that is, $clearsql purification label plus addslashes.

A function call is found to determine whether the function name is a security function configured in the configuration file.

If it is, add the purge label to the symbol in location.

7. Inter-process Analysis

If in the audit, the discovery user function call, this time must carry on the process analysis, in the analysis project to locate the concrete method the code block, carries the variable to carry on the analysis.

The difficulty is how to do variable backtracking, how to deal with the same names in different files, how to support invocation analysis of class methods, and how to save user-defined sink(such as in myexec ). exec function, if not effectively purified, then myexec also considered a dangerous function), how to customize the user's Sink to categorize (such as SQLI XSS XPATH , etc.).

The processing flow is as follows:

8. Stain Analysis

With this process in place, the last thing to do is stain analysis, which focuses on some of the risk functions built into the system, such as Echo, which can lead to XSS . and to make effective analysis of dangerous parameters in dangerous functions, these analyses include determining whether an effective purification (such as escaping, regular matching, etc.) is performed, and making an algorithm to backtrack the assignment or other transformations of the preceding variable. This is undoubtedly a test of the safety researcher's engineering ability, and also the most important stage of automatic audit.

0x03

Through the introduction above, you can see that there are a lot of pits to implement an automated audit tool of your own. My attempt was also met with n Many difficulties, and static analysis does have some limitations , such as the process of string transformation easily available in dynamic analysis, is difficult to achieve in static analysis, this is not technically able to break through, but the limitations of static analysis itself caused, so simple static analysis if you want to do false positives and false negatives is very low, after all, introduce some dynamic ideas, such as the The code in span style= "Font-family:times New Roman" >eval mvc ci Framework, the code is very scattered, such as the data purification code is placed in input Span style= "Font-family:times New Roman" >php Application, I think it is difficult to achieve a common audit framework, should be treated separately.

The above is only rough to my current attempt (not fully realized) to share, after all, college dogs are not professionals, hope can be a catalyst, so that more and more security researchers pay attention to this area.


On the technology of PHP Automation code auditing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.