PHP automated white-box audit technology and implementation

Source: Internet
Author: User
Tags taint

PHP automated white-box audit technology and implementation

There are only a small number of open PHP automated auditing technical materials in China. In contrast, there have been excellent automated auditing implementations in foreign countries. For example, RIPS performs a series of code analysis based on the token stream. Traditional static analysis technologies, such as data stream analysis and pollution Propagation analysis, are rarely used in dynamic scripting language analysis such as PHP, but are key technical points in white box automation technology. Today, I will introduce my recent research and achievements. I hope more security researchers in China will devote their efforts to the meaningful field of PHP automated audit technology.

0x01 basic knowledge

There are many ways to implement automated auditing, such as using a regular expression rule repository for location matching. This method is the easiest, but the accuracy is the lowest. The most reliable approach is to design with knowledge in the field of static analysis technology. Generally, the process of static analysis security tools is mostly in the form:


The first thing we need to do in static analysis is to model the source code. To put it simply, we need to convert the source code of the string into an intermediate representation that facilitates subsequent vulnerability analysis, A group represents the data structure of the Code. In modeling, methods in the compilation technical field are generally used, such as lexical analysis to generate tokens, generate abstract syntax trees, and generate control flowcharts. The advantages and disadvantages of modeling work directly affect the effects of subsequent pollution Propagation analysis and data stream analysis.

Execution analysis combines security knowledge to analyze and handle vulnerabilities in the loaded code. Finally, the static analysis tool should generate judgment results to end the work at this stage.

0x02 Implementation ideas

After a period of effort, I and my friends have also roughly implemented a static analysis tool for automation. The specific implementation idea is to use static analysis technology. If you want to learn more about the implementation idea, you can read the previous article. In the tool, the automated audit process is as follows:


First, load all the PHP files in the project directory to be scanned entered by the user, and identify these PHP files. If the scanned PHP file is Main file, that is, the PHP file that actually processes the user request, then, vulnerability analysis is performed for this type of files. If it is not the Main file type, such as the class definition in the PHP project and the tool function definition file, skip this step and do not perform analysis.

Second, we will collect global data and define the class information in the project to be scanned, such as the file path of the class, the attributes of the class, the methods and parameters of the class. At the same time, a file summary is generated for each file. In the file summary, information about each value assignment statement is collected, as well as the purification information and encoding information of related variables in the value assignment statement.

After global initialization, compile the front-end module and use the open-source PHP-Parser tool to construct the abstract syntax tree (AST) for the PHP code for analysis. Based on AST, CFG is used to construct a control flow chart and generate abstract information of basic blocks in real time.

During frontend compilation, if a sensitive function is called, The system stops for pollution Propagation Analysis, inter-process analysis, and intra-process analysis to find the corresponding sensitive data. Then, based on the information collected during the data stream analysis process, the information is purified and the encoding information is judged to determine whether the vulnerability code is used.

If the vulnerability code is used in the previous step, the vulnerability report module is transferred to collect the vulnerability code segment. The basis of its implementation is to maintain a result set context object in the singleton mode in the system environment. If a vulnerability record is generated, it is added to the result set. After the results of the entire scanning project are completed, Smarty is used to output the result set to the front end, and the front end is used to visualize the scanning results.

0x03 Initialization

In real PHP auditing, when we encounter calls to sensitive functions, such as mysql_query, We can manually analyze the first parameter to check whether it is controllable. In fact, many CMS will encapsulate some database query methods to make the call convenient and the Program Logic clear. For example, it is encapsulated as a class of MysqlDB. In this case, we will not search for the mysql_query keyword in the audit, but will look for calls such as db-> getOne.

So the question is, how can we know that the db-> getOne function is a Sort class method of a database during automated program analysis?

This requires the collection of all classes and defined methods of the entire project at the early stage of automated analysis, so that the program can find the method body to be followed up during analysis.

Collection of class information and method information should be completed as part of framework initialization and stored in the context of a single example:


At the same time, you need to identify whether the analyzed PHP file is a file that actually processes user requests, because in some CMS, encapsulated classes are generally written into separate files, for example, encapsulate the database operation class or file operation class into a file. It is meaningless to conduct pollution Propagation Analysis on these files. Therefore, the framework needs to be identified during initialization. The principle is very simple. It analyzes the proportion of calling type statements and defining type statements, based on the threshold value, the error rate is small.

Finally, the abstract operation is performed on each file. The purpose of this step is to analyze and use files in the case of require or include statements in subsequent analysis. It mainly collects variable assignment, variable encoding, and variable purification information.

0x04 User Function Processing

Common web vulnerabilities are generally caused by controllable dangerous parameters. These vulnerabilities are known as taint vulnerabilities, such as common SQLI and XSS vulnerabilities. Some built-in functions of PHP are inherently dangerous. For example, echo may cause reflected XSS. However, in real code, no one will directly call some built-in function functions, but re-encapsulate them as user-defined functions, such:
 

function myexec($cmd){    exec($cmd) ;}

In implementation, our processing process is:

Locate the corresponding method code segment using the context information obtained during initialization

Analyze the code snippet and find the dangerous function (exec here)

Locate the dangerous parameter in the dangerous function (cmd here)

If no purification information is encountered during the analysis, this parameter can be infectious, then it is mapped to the first parameter cmd of the User Function myexec, the user-defined function is stored as a dangerous function in the Context Structure for Recursive return, and the stain analysis process is started.

To sum up, we will follow the corresponding class methods, static methods, and functions to query the calls of dangerous functions and dangerous parameters from these code segments, these built-in PHP dangerous functions and parameter locations are all configured in the configuration file. If these functions and parameters are discovered, and dangerous parameters are not filtered, the user-defined function is used as the User-Defined dangerous function. Once these functions are found to be called in subsequent analysis, the stain analysis is started immediately.

0x05 process variable purification and encoding

In the real audit process, once dangerous parameters are found to be controllable, we can't wait to find out whether programmers have effectively filtered or encoded the variables, this determines whether a vulnerability exists. This idea is also followed in automated auditing. In implementation, we should first conduct statistics and configuration on every security function in PHP. During program analysis, every data stream information should be traced back to collect necessary purification and encoding information, for example:
$a = $_GET['a'] ;$a = intval($a) ;echo $a ;$a = htmlspecialchars($a) ;mysql_query($a) ;

The code snippet above looks a little weird, but it is only used for demonstration. From the code snippet, we can see that variable a has been purified by intval and htmlspecialchars. According to the configuration file, we have collected this information smoothly. At this time, we need to perform a backtracking to purify the current code line and merge the code information. For example, in the third row, the purification information of variable a has only one intval, but in the fifth row, the purification information of variable a is required to be merged and collected as a list set intval and htmlspecialchars, the method is to collect information about all the data streams in the front-end code and trace back.


The details section is that when you call two functions for the same variable, such as base64_encode and base64_decode, the base64 encoding of this variable will be eliminated. Similarly, if both the conversion and inverse meanings are performed, they must be eliminated. However, if the call order is incorrect or only decode is performed, you know, it is quite dangerous.

0x06 variable backtracking and stain analysis

1. Variable backtracking

In order to find out all the parameters of the dangerous sink point (traceSymbol), we will forward back all the basic blocks connected to the current Block. The specific process is as follows:

Loop through all the entry edges of the current basic block, find the unpurified traceSymbol, and find the traceSymbol name in the DataFlow attribute of the basic block.

If it is found, replace it with the mapped symbol, and copy all the purification information and encoding information of the symbol. Then, the tracing will be performed on all the portals.

Finally, results in different paths on CFG will be returned.

When traceSymbol is mapped to a static object of the static string or number type, or the current basic block has no entry edge, the algorithm stops. If traceSymbol is a variable or array, check whether it is in a super global array.

2. stain analysis

Taint analysis starts when processing both built-in and user-defined functions during inter-process analysis. If a program encounters a sensitive function call during analysis, it uses backtracking or gets the dangerous parameter node from the context, and start the stain analysis. In general, it is to identify whether dangerous parameters may cause vulnerabilities. The TaintAnalyser code is used to analyze stains. After obtaining dangerous parameters, take the following steps:

First, find the value assignment of dangerous parameters in the current basic block, and find whether the user input source exists in the right node of DataFlow, such as $ _ GET $ _ POST and other super global arrays. The plug-in categories identified by different types of vulnerabilities are used to determine whether these nodes are secure.

If the source is not found in the current basic block, the multi-Basic Block analysis process is entered. First, obtain all the basic precursor blocks of the current basic block. The basic precursor block contains a parallel structure (if-else) or a non-parallel structure (Common Statement ). And conduct hazard variable analysis. If the basic block of the current loop does not have a precursor node, the analysis algorithm ends.

If no vulnerability is found in the basic inter-block analysis, the final inter-File analysis is performed. Contains the file summary before loading the current basic block, and traverses these file summaries for judgment.

If a vulnerability occurs in the preceding steps, go to the vulnerability reporting module. Otherwise, the system continues to perform code analysis.


0x07 current results


We performed a testing scan of the simple-log_v1.3.12 and the result was:

Total: 76 XSS: 3 SQLI: 62 INCLUDE: 5 FILE: 3 FILEAFFECT: 1

The test code has some obvious vulnerabilities and does not use the MVC framework. The current technology does not support character truncation and eating escape characters, but it can also be used. From the test process, there are endless bugs, mainly because many Syntax structures and test cases are not taken into account during the preliminary implementation, and the algorithms are almost recursive, therefore, it is easy to cause infinite recursion and Apache crashes.

Therefore, the current code can only be regarded as a test product. the robustness of the Code requires numerous refactoring and a large number of tests. I don't have much time to maintain it.

0x08 Summary

In the field of static analysis, many security researchers focus on C/C ++/decompilation and assembly. Currently, the field of scripting language is in urgent need of technical efforts, because this is a very meaningful thing.

Back to the pitfalls, I and my friends have a major problem in their implementation, that is, they do not support the MVC framework. These MVC frameworks, such as the CI framework, make it difficult to centrally capture data streams because the framework is highly encapsulated. Therefore, different analysis methods are required for different frameworks.

The current situation is that some simple vulnerabilities can be identified, and the code is not robust enough and there are many bugs.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.