PHP Automatic white Box audit technology and implementation ____php

Source: Internet
Author: User
Tags class definition function definition
0x00 Preface

The domestic open PHP automated audit technology data is relatively small, compared to foreign countries have appeared relatively excellent automated audit implementation, such as Rips is based on token flow for a series of code analysis. Traditional static analysis techniques such as data stream analysis and pollution propagation analysis are relatively few in the Dynamic scripting language analysis, but they are the key technical points in the implementation of white box automation technology. Today, the author mainly introduces the recent research and achievement, in this paper, I hope that more security researchers in the country will devote their energies to PHP automation audit technology in this meaningful field. 0x01 Basic Knowledge

There are many ways to implement automated audits, such as direct use of regular expression rule base to locate matching, this method is the simplest, but the accuracy is the lowest. The most reliable way is to combine the static analysis technology domain knowledge to design, the general static Analysis security tool's flow mostly is the following diagram form:

Static analysis work to do the first thing is to model the source code, popular point, that is, the source of the string to facilitate our subsequent vulnerability analysis of the intermediate representation, that is, a group of data structure representing this code. In the modeling work, we usually adopt the methods in the field of compiling technology, such as lexical analysis to generate token, abstract syntax tree generation, control flow chart and so on. The advantages and disadvantages of modeling work directly affect the effect of subsequent pollution propagation analysis and data flow analysis.
Execution analysis is a combination of security knowledge, the load code for vulnerability analysis and processing. Finally, the static analysis tool is to generate judgment results to end this phase of work. 0x02 Realization Idea

After a period of effort, the author and the small partners have largely implemented a static analysis tool for automation. The specific realization of the idea is to use static analysis technology, if you want to understand the realization of ideas, you can read the previous article.
In the tool, the automated audit process is as follows:

The

First loads all the PHP files in the project directory to be scanned by the user, and makes a judgment on these PHP files, if the scanned php file is main file, that is, the PHP file that really handles user requests, then the vulnerability analysis of this type of file. If it is not the main file type, such as the class definition in the PHP project, and the tool function definition file, skip parsing. Second, the global data collection, the focus of information to be scanned in the project to scan the definition of class information, such as the file path of the class, attributes in the class, methods in the class and parameters and other information. At the same time, the file summary is generated for each file, in which the information of each assignment statement is collected, and the information of the related variables in the assignment statement is purified and encoded. After global initialization, the work of compiling the front-end module is done, and the open Source tool is used to php-parser the parsing PHP code to construct the abstract syntax tree (AST). Based on the AST, a control flow diagram is built using the CFG construction algorithm, and the basic block summary information is generated in real time. In the work of compiling the front-end, if we find the call of the sensitive function, we will stop to analyze the pollution, analyze the process, analyze the process and find the corresponding stain data. Then, based on the information collected in the process of data flow analysis, the judgment of the purified information and the coded information is made to determine whether the code is vulnerable.
If the previous step is a vulnerability code, then the vulnerability reporting module is being collected for vulnerable code snippets. Its implementation is based on maintaining a result set context object for a single case pattern in a system environment, and adding to the result set if a vulnerability record is generated. When the entire scanning engineering results, using Smarty to the result set output to the front-end, the front-end to do scanning results visualization. 0x03 initialization Work

In the real PHP audit, the call to the sensitive function, such as mysql_query, we will involuntarily to manually analyze the first parameter to see if it is controllable. In fact, many CMS will be a number of database query methods to encapsulate, making the call convenient and clear program logic, such as encapsulation of a class mysqldb. At this time, in the audit we will not search the Mysql_query keyword, but to find such as db->getone this kind of call.
So the question is, how do you know that the Db->getone function is a database access class method when the automated program is analyzed?
This requires that all classes and defined methods of the entire project be collected at the beginning of an automated analysis so that the program can find a way to follow up when it is analyzed.
Collection of class information and method information should be done as part of the framework initialization and stored in a single instance context:

At the same time, the need to identify the analysis of the PHP file is to really deal with the user request files, because some CMS, the encapsulated class will typically write to a separate file, such as the database operation class or file action classes encapsulated in the file. For these documents, to carry out pollution analysis is meaningless, so when the framework initialization needs to be identified, the principle is very simple, the analysis of the call type statement and the definition of the proportion of statements, according to the threshold for discriminant, error rate is very small.
Finally, the summary operation of each file, the purpose of this step is to follow the analysis of Require,include and other statements when the analysis of the use of the file. The main collection variables of the assignment, variable encoding, variable purification information. 0x04 user function processing

Common web vulnerabilities are generally caused by user-controllable risk parameters, which are called stain type vulnerabilities, such as common SQLI,XSS.
Some of the functions built into PHP are inherently dangerous, such as ECHO, which may cause a reflection of XSS. However, in the real code, no one will call some of the built-in functional functions, but to encapsulate again, as a custom function, such as:

function Myexec ($cmd)
{
    exec ($cmd);
}

In the implementation, our process is to use the initialization of the context of the information, to locate the corresponding method code snippet analysis of this code fragment, find the Dangerous function (here is EXEC) Locate dangerous function in the risk parameters (here is cmd) if the analysis period does not encounter purification information, Description This parameter can be transmitted, then map to the user function myexec the first parameter cmd, and this user-defined function as a dangerous function stored in the context of the recursive return, start the stain analysis process

summed up as a word, we are following the corresponding class method, static methods, functions, from these code snippets to query whether there are dangerous functions and risk parameters of the call, these PHP built-in hazard functions and parameter positions are placed in the configuration file for configuration, if these functions and parameters once found, And if the risk parameters are not filtered, the user-defined function is used as a user-defined risk function. Once these functions are found in subsequent analysis, a stain analysis is initiated immediately. purification and coding of 0x05 processing variables

In the real audit process, once the risk parameters are found to be controllable, we will be eager to find out whether the programmer has the variable for effective filtering or coding, so as to determine if there is a loophole.
In automated audits, this is the way to follow. In the implementation, first of all, to the security functions in PHP statistics and configuration, in the program analysis, for each data flow information, should be retrospective collection of necessary purification and coding information, such as:

$a = $_get[' a '];
$a = intval ($a);
echo $a;
$a = htmlspecialchars ($a);
mysql_query ($a);

The code snippet above looks strange, but is only used as a demo. As you can see from the code fragment, variable A has passed through the intval and htmlspecialchars two purification processes, according to the configuration file, we have successfully collected this information. At this point, a retrospective is done to merge the current code line up with the purge and encode information.
For example, in the third row, the purification information of variable A is only one intval, but the last line requires that the purification information of variable A be merged into a list collection intval and Htmlspecialchars, by collecting the information of all the data streams in the precursor code and backtracking.

The detail part is that when the user invokes the same variable at the same time as Base64_encode and Base64_decode two functions, the base64 encoding of the variable is eliminated. Similarly, if both escape and inversion are to be eliminated. But if the call is in the wrong order or only decode, then you know, it's pretty dangerous. 0x06 variable backtracking and stain analysis 1. Variable backtracking

In order to find out all the parameters of the dangerous sink point (Tracesymbol), all the basic blocks connected to the current block will be traced forward, as follows: loops all the entry edges of the current base blocks, Look for Tracesymbol and find the basic block dataflow attribute, Tracesymbol's name. If once found, replace the mapped symbol and copy all of the symbol's purge and encode information. The trace is then carried out at all the entrance edges. Finally, the results on the different paths on the CFG are returned.

The algorithm stops when Tracesymbol is mapped to a static string, numeric, or static object of a type, or if the current base block has no entry edge. If the Tracesymbol is a variable or an array, check to see if it is in a super global array. 2, stain analysis

Stain analysis begins during process analysis and processing of built-in and user-defined functions, and if a sensitive function call is encountered during program analysis, use backtracking or retrieve the hazard parameter node from the context and begin the stain analysis. In layman's terms, it is possible to determine whether dangerous parameters may lead to loopholes. Stain analysis work is implemented in code Taintanalyser, after obtaining the dangerous parameter, the concrete steps are as follows: First, find the assignment of the dangerous parameter in the current basic block, look for the existence of the user input source in the right node of the dataflow, such as Get _get _ A super global array, such as post. and use different types of vulnerability discriminant plug-in classes to determine whether these nodes are safe. If source is not found in the current base block, a multiple basic block parsing process is entered into this file. First, all the precursor blocks of the current base block are obtained, wherein the precursor base block contains a parallel structure (If-else if-else) or a non parallel structure (normal statement). and carries on the risk variable analysis, if the current cycle basic block does not have the predecessor node, then the analytic algorithm end. If the vulnerability is not found in the basic block analysis, the final file analysis is performed. A summary of the included files before the current base block is loaded and the summary of the files is traversed to make a decision. If there is a vulnerability in the above steps, enter the vulnerability reporting module. Otherwise, the system continues to analyze the code further down.

0x07 the current effect


We conducted a beta scan of the simple-log_v1.3.12, resulting in:
total:76 xss:3 sqli:62 include:5 file:3 fileaffect:1
Test code is some of the more obvious vulnerabilities, and did not use the MVC framework, what character truncation to eat the escape character, the current technology is not really supported, but it can be swept out some. From the test process, there are many bugs, mainly in the early implementation, a lot of grammatical structure and test cases are not taken into account, and the algorithm is almost recursive, so it is easy to cause infinite recursion led to kneel off Apache.
So the current code really can only be a test, the robustness of the code needs countless times to refactor and a large number of tests to achieve, the author has not much time to maintain. 0x08 Summary

The field of static analysis, many security researchers are doing c/c++/disassembly assembly, and so on, the scripting language field is also in urgent need of technical input, because this is a very meaningful thing.
back to the pit, the author and the small partners of the implementation, there is a major problem is not support the MVC framework. These MVC frameworks, such as the CI framework, make it difficult for data streams to be uniformly captured because of the high framework encapsulation. Therefore, different analysis methods are needed for different framework estimation.
The current situation is that some simple vulnerabilities can be identified, the code is not robust and there are many bugs.
Finally, the talk is cheap and show me the code. Implementation codes can be found on GitHub. The purpose of the
Code sharing is for research and discussion with security researchers who are interested in or have been involved in the field, and are not up to the effect of just taking out a CMS, and hope we don't have any illusions.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.