PHP Automatic white Box approval technology and implementation

Source: Internet
Author: User
PHP Automation White Box audit technology and implementation

0x00 Preface

The domestic public PHP automation audit technical data is relatively small, in contrast, there has been a relatively good automated audit implementation, such as Rips is based on the token stream as a basis for a series of code analysis. Traditional static analysis techniques such as data flow analysis and pollution propagation analysis are relatively few in the dynamic scripting language, but they are the key technical points to realize the white box automation technology. Today I mainly introduce the recent research and implementation of the results, in this, I hope that more security researchers in the country to devote their energies to the PHP Automation audit technology, a meaningful area.

0X01 Basic Knowledge

There are many ways to implement automated audits, such as locating matching using regular expression rules library, which is the simplest, but the lowest accuracy rate. The most reliable idea is to combine the knowledge in the field of static analysis technology to design, the general static analysis of the security tool flow is mostly the form:

Static analysis work to do the first thing to do is to model the source code, in layman's words, is the source of the string to facilitate our subsequent vulnerability analysis of the intermediate representation, that is, a set of data structure representing this code. In the modeling work, the methods in the field of compiling technology are used, such as lexical analysis to generate tokens, generate abstract syntax tree, generate control flowchart and so on. The merits and demerits of the modeling work directly affect the subsequent pollution propagation analysis and data stream analysis effect.
Execution analysis is a combination of security knowledge, the code loaded into the vulnerability analysis and processing. Finally, the static analysis tool to generate the judgment results, thus ending this phase of work.

0x02 Realization Idea

After a period of effort, the author and the small partners have also roughly implemented a static analysis tool for automation. Concrete implementation of the idea is the use of static analysis technology, if you want to understand the implementation of ideas, you can read the previous article.
In the tool, the automated audit process is as follows:

  • First load the user input of the project directory to be scanned all the PHP files, and to distinguish these PHP files, if the scanned PHP file is the main file, that is, the actual processing of the user request PHP file, then this type of file for vulnerability analysis. If it is not a main file type, such as a class definition in PHP engineering, a tool function definition file, skip parsing.
  • Second, the global data collection, focus on the information to be scanned in the project of the definition of class information, such as the file path of the class, attributes in the class, methods in the class and parameters and other information. At the same time, each file generated a summary of the file summary, the focus of the collection of each assignment statement information, as well as the relevant variables in the assignment statement purification information and coding information.
  • After global initialization, work on compiling the front-end module, using the Open Source Tool php-parser the parsed PHP code for the construction of the abstract Syntax tree (AST). On the basis of AST, a control flow graph is built using CFG construction algorithm, and the summary information of basic blocks is generated in real time.
  • In the work of compiling the front-end, if the call of the sensitive function is found, it stops to carry on the pollution propagation analysis, carries on the process analysis, in-process analysis, finds the corresponding stain data. Then, based on the information collected in the process of data stream analysis, the decision of purifying information and coding information is made to determine whether it is a vulnerability code.
    If the previous step is a vulnerability code, the vulnerability reporting module is transferred to the vulnerability code snippet for collection. Its implementation is based on maintaining a singleton schema's result set context object in the system environment, and if a vulnerability record is generated, it is added to the result set. After the entire scan engineering results, using Smarty to output the result set to the front end, the front-end to do the visualization of the scan results.

0X03 Initialization of work

In a real PHP audit, the invocation of a sensitive function, for example mysql_query , we will involuntarily go to manually analyze the first parameter to see if it is controllable. In fact, many CMS will encapsulate some database query methods, make the invocation convenient and the program logic clear, such as encapsulation for a class mysqldb. At this point, in the audit we will not search for mysql_query keywords, but to find such as db->getOne this kind of call.
So the question is, how do you know that the Db->getone function is a database access class method when an automated program is analyzed?
This requires that all classes and defined methods of the entire project be collected at the beginning of the automated analysis so that the process can be searched for the method body that needs to be followed.
For the collection of class information and method information, it should be done as part of the framework initialization, stored in a single case context:

At the same time, it is necessary to identify whether the parsed PHP file is a file that really handles user requests, because in some CMS, the encapsulated class is typically written to a separate file, such as a database operation class or a file operation class encapsulated in a file. For these files, the analysis of pollution propagation is meaningless, so in the framework of the initialization of the need to identify, the principle is very simple, analysis of the call Type statement and definition type statements, according to the threshold value, the error rate is very small.
Finally, each file is summarized, and this step is intended to be used for inter-file analysis when it encounters Require,include and other statements during subsequent analysis. The main collection variables assignment, the variable code, the variable purification information.

0x04 user Function processing

Common web vulnerabilities are generally caused by user-controllable risk parameters, which are known as spot type vulnerabilities, such as common SQLI,XSS.
Some of the functions built into PHP are inherently dangerous, such as echo may cause reflective XSS. However, in real code, no one calls directly on some of the built-in function functions, but instead encapsulates them again as custom functions, such as:

function myexec($cmd){    exec($cmd) ;}

In the implementation, our processing flow is:

  • Navigate to the appropriate method code snippet using the context information obtained in the initialization
  • Parse this code snippet to find the dangerous function (this is EXEC)
  • Locate the hazard parameter in the hazard function (this is cmd)
  • If no decontamination information is encountered during the analysis, indicating that the parameter can be transmitted, map to the first parameter of the user function myexec cmd, and store the user-defined function as a dangerous function in the context structure
  • Recursive return, initiating the blot analysis process

To summarize, we are following the corresponding class methods, static methods, functions, from these snippets to query whether there are dangerous functions and dangerous parameters of the call, these PHP built-in dangerous functions and parameters are placed in the configuration file configuration is done, if these functions and parameters once found, If the risk parameter is not filtered, the user-defined function is used as the user-defined danger function. Once these functions are found in subsequent analyses, the spot analysis is started immediately.

Purification and coding of 0x05 processing variables

In the real audit process, once the risk parameters are found to be controllable, we can not wait to find out whether the programmer is effective in filtering or encoding the variable, thus determining the existence of a loophole.
In the automation audit, also follows this idea. In the implementation, the first to each PHP security function in the statistics and configuration, in the program analysis, for each data flow information, should be retrospective collection of necessary purification and encoding information, such as:

$a$_GET['a'] ;$a = intval($a) ;echo$a ;$a = htmlspecialchars($a) ;mysql_query($a) ;

The code snippet above looks a bit weird, but it's only used as a demo. As can be seen from the code snippet, the variable A has been intval and htmlspecialchars two purification, according to the configuration file, we successfully collected this information. At this point, a backtracking is performed in order to merge the cleansing and encoding information up the current line of code.
For example, in the third row, the purification information of variable A is only one intval, but the second row, the need to merge the purification information of variable A, collected as a list set intval and Htmlspecialchars, the method is to collect all the data flow in the precursor code information, and backtracking.

The detail part is that when the user calls the same variable as Base64_encode and Base64_decode two functions at the same time, the Base64 encoding of the variable is eliminated. Similarly, if both escape and invert semantics are also eliminated. But if the call order is incorrect or only decode, then you know, quite dangerous.

0x06 variable backtracking and blot analysis

1. Variable Backtracking

In order to find out all the dangerous sink point parameters (Tracesymbol), all the basic blocks connected to the current block will be traced forward, with the following process:

  • Loops through all the entry edges of the current base block, looking for tracesymbol that are not purified and looking for the basic block dataflow attribute, Tracesymbol name.
  • If it is found, it is replaced with the mapped symbol, and all the purification and encoding information of the symbol is copied over. Then, the trace will be carried out at all the entrance edges.
  • Finally, the results on the different paths on the CFG are returned.

The algorithm stops when Tracesymbol maps to a static string, a static object of a numeric type, or if the current base block has no entry edge. If Tracesymbol is a variable or an array, check whether it is in a hyper-global array.

2. Stain Analysis

Stain analysis begins during process analysis of both built-in and user-defined functions, and if a program parses a sensitive function call, it uses backtracking or gets the dangerous parameter node from the context and begins a spot analysis. In layman's words, it is possible to determine whether dangerous parameters can lead to loopholes. The stain analysis work is implemented in the code, TaintAnalyser and after obtaining the hazard parameters, the following steps are taken:

  • First of all, in the current basic block looking for the value of the dangerous parameter, look for the right node of the dataflow there is a user input source, such as G E T _post such as hyper-global arrays. and use plug-in classes that discriminate between different types of vulnerabilities to determine whether these nodes are safe.
  • If source is not found in the current base block, then enter this file for more basic inter-block parsing. First, all the precursor blocks of the current base block are obtained, where the precursor basic block contains a parallel structure (If-else if-else), or a non-parallel structure (normal statement). and carries on the analysis of the dangerous variables, if there is no precursor node in the basic block of the current loop, the analysis algorithm ends.
  • If the basic inter-block analysis does not find a vulnerability, the final inter-file analysis is performed. Contains a summary of the files before loading the current base block and iterates through the file summaries to make a judgment.
  • If a vulnerability occurs in the preceding steps, the vulnerability reporting module is entered. Otherwise, the system continues with code analysis.

0X07 the current effect

We simple-log_v1.3.12 performed a test scan and the results were:
Total : 76 XSS : 3 SQLI : 62 INCLUDE : 5 FILE : 3 FILEAFFECT : 1
Test code is a few obvious vulnerabilities, and do not use the MVC framework, what character truncation to eat escape character this, the current technology does not support, but also can sweep out some. From the test process, the bug is emerging, mainly in the early implementation, many grammatical structure and test cases are not taken into account, and the algorithm is almost recursive, so it is easy to cause the infinite recursion led to Apache kneeling.
So the current code is really only a test, the robustness of the code requires countless refactoring and a lot of testing to achieve, the author has not much time to maintain.

0X08 Summary

In the field of static analysis, many security researchers are doing c/c++/compilation and other directions, the Scripting language field is also in urgent need of technical force to put in, because this is a very meaningful thing.
Back to the pit, the author and the small partners in the implementation, there is a major problem is not to support the MVC framework. These MVC, such as the CI framework, are difficult to capture uniformly because of the high level of framework encapsulation. Therefore, different analysis methods are needed for different frame estimation.
The current situation is that some simple vulnerabilities can be identified, and there are many bugs in the code that is not strong enough.
Finally, the talk is cheap, show me the code . Implementation code can be found on GitHub.
The purpose of the code share is to be interested in or have been engaged in the field of security researchers for research and discussion, at present, do not come up with a CMS can run the effect, hope you do not have any illusions.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.