A brief talk on PHP Automation code Auditing Technology and _php tutorial of PHP Automation audit

Source: Internet
Author: User

A brief talk on PHP Automation code auditing Technology and the automatic audit of PHP


Source: Exploit Welcome to share the original to Bole headlines

0x00

Because there is nothing to update the blog, I will do the current things to summarize, as a blog, mainly to talk about some of the technology used in the project. At present, there are many automated audit tools on the market, open source has rips, Pixy, commercial version of the fortify. Rips now only the first version, because the PHP object-oriented analysis is not supported, so now look at the effect is not ideal. Pixy is a tool based on data flow analysis, but only supports PHP4. And Fortify is a commercial version, because of this limitation, the study of it is impossible to talk about. Domestic research on the automatic audit of PHP is generally done by the company, some tools are mostly using simple token stream analysis or directly rude, using regular expressions to match, the effect will be very general.

0x01

Today's technology is based on static analysis of a PHP automated audit implementation of the idea, but also my project ideas. The regular expression effect is certainly not ideal for more efficient variable-based and stain analysis, as well as a good response to various flexible syntax representations in PHP scripts, and the approach I've introduced is based on the auditing of code static analysis techniques and data flow analysis techniques.

First, I think an effective audit tool contains at least the following modules:

1. Compile the front-end module
The compiler front-end module mainly uses the abstract syntax tree construction, the control flow graph construction method in the compilation technique, turns the source file into the form which is suitable for the back-end static analysis.

2. Global Information Collection module
The module is mainly used for the analysis of the source files for unified information collection, such as the collection of the audit project how many definitions of the class, and the method name in the class, parameters, as well as the method definition code block start and end of the line number to be used to speed up the subsequent static analysis of the speed.

3. Data Stream Analysis Module
This module is different from the data stream analysis algorithm in the compiling technology, and pays more attention to the processing of the PHP language itself in the project. When the call of the sensitive function is found in the process of the system and in-process analysis, the sensitive parameters in the function are analyzed by the data stream, which is to trace the specific change of the variable and prepare for the subsequent stain analysis.

4. Vulnerability Code Analysis module
This module is based on the information of global variables and assignment statements collected by the Data Flow Analysis module, and analyzes the spot data. Mainly for the sensitive sink in the dangerous parameters, such as the mysql_query function in the first parameter, through backtracking to obtain the corresponding data flow information, if in the backtracking process found that the parameter has user control signs, it is recorded. If the danger parameter has the corresponding coding, the purification operation also must record. The stain analysis is done by tracking and analyzing the data of the dangerous parameters.

0x02

With modules, how to implement an effective process for automating audits, I use the following process:

The approximate flow of the analysis system is as follows:

1. Framework Initialization

First, the initial work of the analysis framework is to collect information about all the user-defined classes in the source code project, including the class name, class attribute, class method name, and the file path where the class resides.
The record is stored in the global Context class context, which is designed using a singleton pattern and resides in memory for subsequent analysis purposes.

2. Judge the main File

Next, determine if each PHP file is main file. In the PHP language, there is no so-called main function, most of the PHP files in the web are called and defined two types, the definition of the type of PHP file is used to define some business classes, tool classes, tool functions, etc., do not provide access to the user, but rather to the invocation of the type of PHP file to call. The actual processing of a user request is a PHP file of the calling type, such as a global index.php file. Static analysis is primarily for PHP files that handle the invocation type of the user request, which is the main file. The judgment is based on:
On the basis of the completion of the AST parsing, judging the class definition in a php file, the number of code lines in the method definition is more than one range of the number of lines of code for the file, if it is, it is considered a PHP file of the defined type, otherwise the main file is added to the list of filenames to be analyzed.

3. Construction of AST abstract syntax tree

This project is based on the PHP language itself, for the construction of its AST, we refer to the current excellent PHP AST build implementation ———— PHP Parser.
The open source project is developed based on the PHP language itself and can parse most of PHP's structures such as if, while, switch, array declaration, method invocation, global variables, and so on. Can be very good to complete the compilation of this project front-end processing part of the work.

4. CFG Flow Diagram Construction

Use the Cfgbuilder method in the Cfggenerator class. The method is defined as follows:

The concrete idea is to construct cfg by recursion. Enter the nodes collection that iterates through the AST, iterating through the type judgments of the Elements (node) in the collection, such as deciding whether to branch, jump, end, etc., and construct the CFG according to the type of node.
Here, for the branch statement, the jump condition of the loop statement (conditions) is stored to the edge (edge) in the CFG, which facilitates data flow analysis.

5. Collection of data stream information

For a block of code, the most efficient and worthwhile information is an assignment statement, a function call, a constant (const define), a registered variable (extract parse_str).
The function of the assignment statement is to follow the variable tracking, in the implementation, I used a structure to represent the value of the assignment and location. The other data information is based on the AST to distinguish and obtain. For example, in a function call, determine whether the variable is escaped, encoded, and so on, or whether the function being called is sink (such as mysql_query).

6, variable purification, coding information processing

$clearsql = Addslashes ($sql);
Assignment statements, when the right side is a filter function (user-defined filter function or built-in filter function), then the return value of the calling function is purified, that is, $clearsql's purification label plus addslashes.
A function call is found to determine whether the function name is a security function configured in the configuration file.
If it is, add the purge label to the symbol in location.

7. Inter-process Analysis

If in the audit, the discovery user function call, this time must carry on the process analysis, in the analysis project to locate the concrete method the code block, carries the variable to carry on the analysis.
The difficulty is how to do variable backtracking, how to deal with the same names in different files, how to support call analysis of class methods, how to save user-defined sink (such as calling the EXEC function in myexec, if not effectively purified, then myexec is considered a dangerous function), How to categorize user-defined sink (such as Sqli XSS XPath, and so on).

The processing flow is as follows:

8. Stain Analysis

With this process in place, the last thing to do is stain analysis, which focuses on some of the risk functions built into the system, such as ECHO, which can lead to XSS. and to make effective analysis of dangerous parameters in dangerous functions, these analyses include determining whether an effective purification (such as escaping, regular matching, etc.) is performed, and making an algorithm to backtrack the assignment or other transformations of the preceding variable. This is undoubtedly a test of the safety researcher's engineering ability, and also the most important stage of automatic audit.

0x03

With the introduction above, you can see that there are a lot of pits to implement an automated audit tool of your own. My attempt also encountered N more difficulties, and static analysis does have certain limitations, such as the process of string transformation easily available in dynamic analysis, it is difficult to achieve in static analysis, this is not technically able to break through, but the limitations of static analysis itself caused, So simple static analysis if you want to do false positives and false negatives is very low, after all, introduce some dynamic ideas, such as the code in the eval, the string change function and the regular expression processing. There are some MVC framework-based, such as the CI Framework, the code is very scattered, such as data cleansing code in the input class extension, like this PHP application, I think it is difficult to achieve a common audit framework, should be treated separately.

The above is only rough to my current attempt (not fully realized) to share, after all, college dogs are not professionals, hope can be a catalyst, so that more and more security researchers pay attention to this area.

http://www.bkjia.com/PHPjc/990264.html www.bkjia.com true http://www.bkjia.com/PHPjc/990264.html techarticle on PHP Automation code auditing technology, Elementary Introduction to PHP automatic Audit Source: Exploit Welcome to share the original to Bole 000 because there is nothing to update the blog, I put the eye ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.