Discussion on PHP-based Automatic Webshell Detection
For network maintenance personnel, I am afraid the biggest headache is that the website is hacked, and a backdoor is left, and even the server is Elevation of Privilege. Hacker usually leaves a backdoor in a relatively difficult place. It is very difficult to troubleshoot one by one, so this article will discuss feasible solutions for automatic Webshell detection in the PHP environment.
0x01 Keyword Detection
Use regular expressions to scan some common Webshell keywords to determine whether the file is Webshell. This detection method is too violent and is also the simplest and most traditional detection method. Obviously, this simple and crude detection will produce a high false positive rate, and some webshells that are encrypted or deformed will not be detected. Therefore, the filtered Webshell must be manually verified by the network maintenance personnel. The principle of killing is: Kill one hundred by mistake, and never miss one ....
For example, the following Get code in a website is said to be an online Webshell to scan and kill the real code:
Class scan {
Private $ directory = '.';
Private $ extension = array ('php ');
Private $ _ files = array ();
Private $ filelimit = 5000;
Private $ scan_hidden = true;
Private $ _ self = '';
Private $ _ regex = '(preg_replace. * \/e | '.*? \ $ .*? '| \ Bcreate_function \ B | \ bpassthru \ B | \ bshell_exec \ B | \ bexec \ B | \ bbase64_decode \ B | \ bedoced_46esab \ B | \ beval \ B | \ bsystem \ B \ B | \ bproc_open \ B | \ bpopen \ B | \ bcurl_exec \ B | \ bcurl_multi_exec \ B | \ bparse_ini_file \ B | \ bshow_source \ B | cmd \. exe | KAdot @ ngs \. ru | group | dedicated | Elevation of Privilege | Trojan | PHP \ s? Rebound | shell \ s? Enhanced version | WScript \. shell | PHP \ s? Shell | Eval \ sPHP \ sCode | Udp1-fsockopen | xxddos | Send \ sFlow | fsockopen \ ('(udp | tcp) | SYN \ sFlood )';
Private $ _ shellcode = '';
Private $ _ shellcode_line = array ();
Private $ _ log_array = array ();
Private $ _ log_count = 0;
Private $ action = '';
Private $ taskid = 0;
Private $ _ tmp = '';
0x02 judgment on obfuscation of Webshell
1. Information Entropy
When it comes to determining encrypted webshells, we have to mention information theory. The basic point of Shannon information theory is to use random variables or random vectors to represent the source, and use the theory of probability theory and random process to study information. The encoded Webshell file contains a large amount of random content or special information. This type of file will generate more ASCII codes, and the entropy value of the file calculated using ASCII codes will increase, that is, the uncertainty of Webshell for common files is measured.
Formula description:
N indicates the ASCII code. n indicates the number of times that the nth ASCII code appears in the current file. n indicates no significance for the determination of characters (spaces) with ASCII 127, S is the total number of characters in the current script file.
The higher the entropy value Info (A), the more likely it is to be Webshell.
For more information about information entropy, see: https://en.wikipedia.org/wiki/Entropy_%28information_theory%29
2. index of coincidence (IC, coincidence index)
Here we use another method: set X to a ciphertext string with a length of n. We use a set to represent this ciphertext string {X1, X2 ,..., Xn}, the coincidence index of X refers to the probability of randomly extracting the same two elements.
Set Ni to the number of times that the character I appears in this password. Two characters are extracted from n ciphertext characters.
Among them, Ni I forms a pair of methods:
Therefore, the two types are compared, that is, the probability that the two characters from X are all I.
The ciphertext randomness of encrypted files increases, and the coincidence index decreases. The encoded Webshell is similar to a random file, while the plain-text Webshell has a random string similar to the one used to extract weights or contains binary and hexadecimal sequences, therefore, the expanded ASCII code is used as the study object to calculate the coincidence index of 254 characters (excluding ASCII 127. For script files, the lower the coincidence index, the higher the possibility of Webshell.
In addition to the two algorithms described above, you can also use base64 encoding for script files to be detected. For encrypted Webshell files, base64 encoding eliminates non-ASCII characters, in this way, the character of base64 encoded parts will have such a feature-smaller and unevenly distributed, that is, the File compression ratio will increase. This method is used to detect webshells with a higher compression ratio than other files.
For the determination of obfuscated webshells, the results calculated based on the algorithm are all specific values. Compare the values based on the set threshold to determine whether the values are webshells. The threshold value must be tested because different websites are different.
0x03 real-time dynamic Webshell detection based on PHP Extension
This method is currently popular, mainly because it uses the HOOK for PHP to call dangerous functions to dynamically detect webshells, which is relatively real-time and fast. To some extent, it makes up for the shortcomings of traditional Webshell static detection and is more convenient.
There are three main ways for PHP to run in WEB containers: module loading and CGI or FastCGI. Three methods have to go through five phases: module initialization and request initialization, code execution. The request ends and the module ends. During PHP code execution, the PHP code is transformed into a tokens Through lexical analysis, and then the syntax analysis is converted into a meaningful expression, finally, compile the expression into an intermediate bytecode (opcodes ). the intermediate bytecode is executed on the Zend Virtual Machine and the result is output.
We use the common interface zend_set_user_opcode_handler provided by the PHP kernel to modify the processing function corresponding to the intermediate bytecode, so as to achieve the PHP kernel HOOK effect. Function prototype:
Int zend_set_user_opcode_handler (zend_uchar opcode, opcode_handler_t handler)
The former is the required opcode, and the latter is the handler function after the hook.
Generally, ZEND_INCLUDE_OR_EVAL, ZEND_DO_FCALL, ZEND_DO_FCALL_BY_NAME (see the example function below) and other processing functions are processed using zend_set_user_opcode_handler in extensions.
Attackers exploit the Arbitrary File Upload Vulnerability to upload webshells to directories. When accessing a File Uploaded, you can determine whether the path of the file is in the blacklist or whitelist. If the path does not comply with the blacklist or whitelist rules, it is regarded as an attack and intercepted in time.
Potentially risky functions that can be hooked:
Command Execution class: passthru, system, popen, exec, shell_exec, etc.
File System class: fopen, opendir, dirname, pathinfo, etc.
Database Operations: mysql_query, mysqli_query, etc.
Callback functions: array_filter, array_reduce, usort, and uksort
Reflection function: ReflectionFunction
PHP extensions are written in pure C and provide the main code for your reference:
# Include "config. h"
# Include "php. h"
# Include "php_ini.h"
# Include "ext/standard/info. h"
# Include "php_waf.h"
Static int le_waf;
Const zend_function_entry waf_functions [] = {
PHP_FE (confirm_waf_compiled, NULL), PHP_FE_END };
Zend_module_entry waf_module_entry = {
# If ZEND_MODULE_API_NO> = 20010901
STANDARD_MODULE_HEADER,
# Endif
"Waf ",
Waf_functions,
PHP_MINIT (waf ),
PHP_MSHUTDOWN (waf ),
PHP_RINIT (waf ),
PHP_RSHUTDOWN (waf ),
PHP_MINFO (waf ),
# If ZEND_MODULE_API_NO> = 20010901
PHP_WAF_VERSION,
# Endif
STANDARD_MODULE_PROPERTIES
};
# Ifdef COMPILE_DL_WAF
ZEND_GET_MODULE (waf );
# Endif
PHP_MINIT_FUNCTION (waf)
{Zend_set_user_opcode_handler (ZEND_INCLUDE_OR_EVAL, manage); // hook eval, etc.
Zend_set_user_opcode_handler (ZEND_DO_FCALL_BY_NAME, manage); // hook Variable Function execution
Zend_set_user_opcode_handler (ZEND_DO_FCALL, manage); // hook Command Execution
Return SUCCESS;
}
Int manage ()/* HOOK processing function */
{Char * filepath = (char *) zend_get_executed_filename (TSRMLS_C );
If (strstr (filepath, "upload")/* determines whether the string "upload" is a sub-string of filepath */
{Php_printf ("Please do not execute malicious code
Execution file path: % s ", filepath );
Return ZEND_USER_OPCODE_RETURN ;}
Else
Return ZEND_USER_OPCODE_DISPATCH;
}
0x04 Summary
In addition to the Webshell detection method used above, there are also network-based detection methods.
For example, the current research focuses on configuring IDS or WAF at the network entrance to detect webshells. Fireeye [28] proposes to use Snort to configure feature rules to detect a single-sentence Trojan. In addition, it configures the ModSecurty core rule set to detect the Webshell upload behavior.
Both methods analyze whether the http request contains special keywords (for example ,,,
To determine whether the attacker is uploading HTML or script files. This method is ineffective for existing webshells.