Summary of Webshell Detection Methods

Source: Internet
Author: User
Keywords webshell webshell definition webshell detection
background
Webshell is a command execution environment in the form of web files such as asp, php, jsp, or cgi. It can also be called a webpage backdoor. After a hacker invades a website, the asp or php backdoor file is usually mixed with the normal web page file in the web server WEB directory, and then you can use the browser to access the asp or php backdoor and get a command execution environment to achieve Control the purpose of the web server.

 

webshell detection model
The running process of Webshell: hacker -> HTTP Protocol -> Web Server -> CGI. In simple terms, it is such a sequence: hackers use a browser to access a CGI file on the Web Server using the HTTP protocol. The tricky part is that a webshell is a legitimate TCP connection. There are no features (of course not absolute) under the application layer of TCP/IP, only detection at the application layer. Hacking the server, using webshell, whether it is transferring files or changing files, there must be a file containing webshell code, it is easy to think of starting from the file code, which is static feature detection; after webshell is running, B/S data interacts through HTTP, Traces of clues can be found in HTTP requests/responses, which is dynamic feature detection.

 

Static detection
The static detection method of searching the webshell by matching the signature, eigenvalue, and dangerous function function can only find the known webshell, and the false alarm rate will be relatively high. However, if the rules are perfect, the false alarm rate can be reduced, but The underreporting rate will definitely increase. The advantage is that it is fast and convenient, the search accuracy of the known webshell is high, and the deployment is convenient, and it can be done by one script. Disadvantages: high false negative rate and false positive rate, unable to find 0day webshell, and easily bypassed. For a single-site website, using static detection is still a great advantage. With manual work, you can quickly locate the webshell, but if it is a large enterprise with tens of thousands of sites, then the workload of human flesh can be huge at this time. So use this idea: strong and weak characteristics. That is, the feature code is divided into strong and weak features, and the strong feature hit must be webshell; the weak feature is judged manually. Add a strong feature, that is, use the features used by popular webshell as the strong feature key monitoring, and once such a feature appears, it can be confirmed that webshell responds immediately. To solve false positives and false negatives, you can't stick to the code level. The problem can be considered from another angle: the file system. We can judge by combining the attributes of the file. For example, apache is started by noboy, and the owner of the webshell must also be nobody. If my Web directory has a file with nobody owner for no reason, there is a problem here. The most ideal way is to need a system and process to build a web directory only publishing entrance, control this entrance, illegally entered Web files can naturally be found.

The author's webshell tool based on static detection https://github.com/he1m4n6a/findWebshell

 

Dynamic detection
The webshell is transmitted to the server. The hacker always has to execute it. The characteristics of the webshell during execution are called dynamic characteristics. Earlier we said that webshell communication is HTTP protocol. As long as we make the webshell-specific HTTP request/response into a feature library, add it to IDS to detect all HTTP requests. If the webshell executes a system command, there will be a process. Under Linux, the nobody user started bash, and under Win, IIS User started cmd. These are all dynamic features. Furthermore, if the hacker connects in reverse, it is easier to detect, both Agent and IDS can catch the current. Webshell always has an HTTP request. If I monitor HTTP at the network layer and detect that someone has accessed a file that has never been asked, and it returns 200, it is easy to locate the webshell. This is the http anomaly model detection. As with detecting file changes, if a new file is added by a non-administrator, it means someone has been compromised. The shortcomings are also obvious, as long as the hackers use the original file, it is easy to bypass, and the deployment cost is high. If the website is updated frequently, the rules must be added continuously. There is another idea to use function hijacking. Recall that when we debug the network horse, how to restore its various weird encryption algorithms, simple, just change eval to alert. Similar, so we can reload some functions in CGI global (such as ASP.Net global.asax file), when there is a webshell call, we can find the exception. Take js as an example (php, asp and other language ideas are the same, save the original function, then re-define the original function, and finally call the saved original function), for example, the following is to reload eval, you can also pop up a danger prompt, etc. To scare out some inexperienced hackers.

<script type="text/javascript">
<!--
var _eval = eval;
eval = function(s) {
    if (confirm("eval is called\n\ncall function\n" + eval.caller + "\n\ncall parameter\n" + s)) {
        _eval(s);
    }
 

Log detection
Using Webshell generally does not leave a record in the system log, but will leave a record of the access data and data submission of the Webshell page in the web log of the website. Log analysis and detection technology establishes a request model through a large number of log files to detect abnormal files, which is called: HTTP abnormal request model detection. For example: a GET request usually has a POST request and the return code is 200, the visitor IP of a certain page, and the access time is regular.

Webshell access features (main features)

A small number of IPs initiate access to it
Less total visits
This page is an orphan page
Of course not all orphaned pages are webshell, the following situations will also cause orphaned pages
(1) Hide access to normal isolated pages such as the management background
(2) Scanner behavior, common vulnerability scanning, PoC scanning, Webshell scanning (the common webshell path plus a sentence payload scanning can often be seen in the log)-this is the most important interference data, which needs to be eliminated
For case (1), use the white list, for case (2) scanner identification
(P.s. Crawler technology, fingerprint recognition technology, scanner recognition (which can be derived from human-machine recognition in a broad sense) can be called a troika of web security technology, and it can’t get around)

Advantages: A certain data analysis method is adopted, and the results of this detection method have a large reference value when the number of website visits reaches a certain level.

Disadvantages: There are certain false positives. For a large number of access logs, the processing power and efficiency of the detection tool will be relatively low.

 

Grammar detection
The form of grammatical and semantic analysis is based on the implementation of php language scanning and compilation, stripping code, comments, analyzing variables, functions, strings, and language structures to achieve the capture of critical dangerous functions. This can perfectly solve the case of underreporting. But there are still problems with false positives.

public function startLexing($code)
{
    if (preg_match('/<\?(php)?\s*@Zend;[\r\n|\n]+\d+;/', $code)) {
        $this->errMsg ='Encrypt with Zend optimizer.';
        return false;
    }
    $this->resetErrors();
    $this->tokens = token_get_all($code);
    $this->code = $code;
    $this->pos = -1;
    $this->line = 1;
    return $this->checkError();
}
The problem with false positives is that one is whether the detected file is a legal php grammar file. The implementation of the token_get_all function does not verify whether the legal php grammar file is asked, but just scans and analyzes it. The server cloud judgment is a kind of fingerprint based on the malicious code string, based on a large number of backdoor data, do grammatical and semantic analysis, do business logic analysis, understand the purpose of this code, and give it the location of malicious code, while other users , You can directly get feedback on the result of whether the code fragment is malicious code. Pecker Scanner first is based on grammatical analysis, stripping tokens, comments, strings, variables, language structures, and then php grammar detection, extraction of malicious code scanning tool to solve the problem of underreporting. At the same time support server cloud judgment, try to avoid false alarm problems. Grammar-based pecker detection tool

 

Statistical testing
Since webshell is often encoded and encrypted, it will show some special statistical characteristics, and learn statistically based on these characteristics.
Typical representative: NeoPI - https://github.com/Neohapsis/NeoPI

NeoPi uses the following five detection methods:

Information entropy (Entropy): measure the uncertainty of files by using ASCII code table;
Longest Word: The longest string may be potentially encoded or confused;
Index of Coincidence: Low Coincidence Index indicates that the file code is potentially encrypted or mixed;
Signature: Search for known malicious code string fragments in the file;
Compression: Compare the compression ratio of files
There are also obvious weaknesses in using this detection method. NeoPi's detection focus is on recognizing obfuscated codes. It often performs well in identifying obfuscated codes or obfuscated trojans. The unobfuscated code is relatively transparent to NeoPi's detection mechanism. If the code is integrated with other scripts in the system, this "normal" file may not be recognized by NeoPi.

 

Deformation and stealth webshell detection
Deformed webshell can be detected by the statistical NeoPI tool mentioned above, or it can be dynamically detected. For example, if a normal programmer uses eval or system, it will not deliberately convert and hide. If a function is found to be executed, the function name cannot be found in the code. We think this is an abnormal behavior. Therefore, deformation encryption can also be searched in this way, find a file in the log and execute the system and other commands, but the file code is not found in the original file, indicating that the file is a backdoor file.

For stealing Webshell must have the ability to operate the database, a new detection method can be derived. The analysis of the difference between database operations by analyzing normal WEB script files and stealing Webshell is the research direction of this detection method. Under normal circumstances, the process of data operations on the WEB site should be a repetitive and more complicated query process. This kind of query is usually very accurate, and there will be no query statements like "select * from" in the query process. The normal WEB script will not cross the database query during the database operation. Once this phenomenon occurs, it can basically be judged as an abnormal WEB script operation process.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.