Summary: This article describes two general IDS evasion technologies based on HTTP protocol. These technologies include the old-style HTTP evasion technology and the new-style HTTP evasion technology. Although different types of evasion techniques are available, they all reside in the HTTP request URI section, using standard HTTP/1.0 and HTTP/1.1 protocols. The evasion technique in the request URI address is usually related to the URL encoding. For Apache and IIS, there are multiple valid URL encoding methods. In this article, we will explain each encoding method and give a specific example.
This article also uses the HTTP protocol attribute to demonstrate the http ids evasion technology against IDS. By reading this article, you will understand the principles of http ids evasion technology and be able to use these general principles and examples to implement http ids evasion that suits your needs.
Indexing terminology: computer security, Hypertext Transfer Protocol, intrusion detection, and network scanning
I. Introduction
Since the first release of the Rain Forest Puppy (RFP) Network Scanner whisker [1], http ids evasion technology has become increasingly popular. In the past, many http ids technologies emerged from the first whisker version, including the simple use of multiple "/" obfuscation directory technologies, it also includes the more complex-Insert "HTTP/1.0" in the URL to avoid IDS algorithms that search for URL addresses.
In addition to the evasion technique in whisker, there are other types of HTTP obfuscation methods. One of the methods to confuse URLs is to use absolute URI and relative URI [2]. Although these methods are interesting, they are not as common as those used in whisker scanning.
The next popular escape method was also released by RFP, using the Microsoft Internet Information Server (IIS) UTF-8 unicode Decoding Vulnerability [3]. Although it is a serious vulnerability in IIS, it also provides a URL encoding method that IDS has never implemented. So far, most IDS still focus only on the ASCII encoding and directory traversal Avoidance Technology of whisker, but there is no corresponding protection for Unicode UTF-8 encoding. Eric Hacker wrote a very professional article on this type of http ids evasion technology [4]. This article also analyzes and explains some of Hacker's ideas. We will continue with Hacker's point of view and gain an in-depth understanding of what these encodings actually mean and how they can create even more strange encodings.
This article describes other types of http ids evasion technologies, using HTTP attributes. One of them is the request pipeline, and the technology that uses the content encoding header and places the parameters of the HTTP request to the request load.
II. ids http protocol analysis
To identify URL attacks, IDS must check the http url field to check for malicious content. The two most popular IDS detection methods-pattern matching and protocol analysis-both require detecting whether the URL contains malicious content (through some form of pattern matching or HTTP protocol analysis ).
The difference between the two methods depends on your purpose. protocol analysis only searches for malicious content in the HTTP stream URL field, while pattern matching searches for the entire data packet.
These two methods are similar before processing malicious URLs. After that, you only need to add an appropriate decoding algorithm to the URL field for protocol analysis (it already has a built-in HTTP decoding engine ). The pattern matching algorithm does not know which part of the package needs to be normalized. Therefore, it must be combined with some form of protocol analysis to find the corresponding URL field to use the corresponding decoding algorithm. Some form of HTTP protocol analysis is added to the pattern matching method, and then the two act similarly.
Because of the similarity of these IDS methods, the http ids Avoidance Method discussed in this article applies to various types of IDS.
The first common method to circumvent IDS is invalid protocol parsing. For example, if an http url is not found correctly, malicious URLs cannot be checked out Because IDS cannot decode the URL if no URL is found.
If the URL is correct, IDS must know the correct decoding algorithm. Otherwise, you still cannot get the correct URL. This is the second method of IDS evasion technology-invalid protocol field decoding.
A. Invalid Protocol Resolution
The IDS evasion technique is parsed using invalid protocols. Many examples are provided in RFP's whisker [1] and Bob Graham's SideStep [5. The difference between the two programs is that whisker uses the defective IDS Protocol Resolution to avoid the check, while SideStep uses the normal network layer protocol to avoid the IDS protocol decoder.
In this case, the Avoidance Technology of invalid protocol resolution is very effective for the URL and URL parameters of two HTTP fields.
For example, if the HTTP decoder of IDS has only one URL for each request package, IDS cannot correctly parse the second URL because the package contains two URLs. This technology will also be mentioned in the request pipeline evasion technology.
B. Invalid protocol segment Decoding
Invalid protocol segment decoding can test whether IDS can process various types of decoding for specific protocol segments.
For HTTP, the main target is the URL field. For IDS, You need to test its compliance with the http rfc encoding standard and whether it supports the encoding type of a specific Web server (such as IIS ). If IDS cannot correctly decode a URL, attackers can skip malicious URL detection using this encoding.
Another invalid HTTP protocol segment encoding is implemented by directory obfuscation and Directory attribute manipulation. For example, for/cgi-bin/phf, you can use multiple "/" instead of one "/" to change the "appearance" of the Directory, or use directory traversal to confuse the directory path. Note that malicious URLs can be hidden only when IDS searches for directories and files. For "/cgi-bin/phf", if IDS finds the "phf" file in the "cgi-bin" directory, our attack examples will work; if IDS only looks for the "phf" file, the directory obfuscation method is useless.
III. Invalid protocol segment Decoding
The prefix of URL Obfuscation is the various encoding methods accepted by the HTTP server. In fact, most of the encoding methods are related to IIS. for the integrity of the article, each encoding type is tested for each HTTP server.
The ideological basis for obfuscation of Web attacks by using URL encoding is that most IDS lack sufficient analysis on the encoding of different types of Web servers. There are problems in IDS pattern matching and Protocol detection technologies.
For URI requests, there are only two RFC standards: hexadecimal and UTF-8 Unicode encoding. Both methods use "%" for encoding. Apache only supports these two URL encoding types.
Most of the other encoding types we study are server-related and do not comply with RFC standards. Microsoft's IIS Web Server belongs to this category. This section also includes URL obfuscation.
A. hexadecimal Encoding
The hexadecimal encoding method is the most simple URL encoding method that complies with RFC requirements. This method requires only one "%" before the hexadecimal byte value of each encoded character ". If we want to perform hexadecimal encoding for uppercase A (the hexadecimal value of ASCII is 0x41), the result of the encoding is:
& Amp; #8226; % 41 = 'A'
B. Dual-hundred-semicolon hexadecimal Encoding
Double-hundred-semicolon hexadecimal encoding is based on normal hexadecimal encoding. The specific method is to encode the percent sign and then encode the hexadecimal value of the information. Encode uppercase A with the following result:
& #8226; % 2541 = 'A'
As you can see, the percent number is encoded as % 25 (equivalent to "%"), and the decoded value is changed to % 41 (equivalent to ""). This encoding method is supported by Microsoft IIS.
C. Dual-four-digit hexadecimal Encoding
The double four-digit hexadecimal encoding is also based on the standard hexadecimal encoding. each four-digit hexadecimal encoding uses the standard hexadecimal encoding method. For example, to encode A in upper case, the result is:
& #8226; % 34% 31 = 'A'
Normal A, hexadecimal encoding is % 41. The double four-digit hexadecimal encoding method is to encode every four digits. Therefore, 4 is encoded as % 34 (which is the ASCII value of Number 4), and the second four digits, 1, encoded as % 31 (this is the ASCII value of number 1 ).
After the first URL Decoding, the four-digit value is changed to numbers 4 and 1. Because there is A % in front of 4 and 1, % 41 is decoded to uppercase A in the second time.
D. The first four hexadecimal Codes
The first four-digit hexadecimal encoding is similar to the double four-digit hexadecimal encoding. The difference is that only the first four digits are encoded. Therefore, for uppercase A, the hexadecimal format of the double four bits is % 34% 31, and the first four bits are encoded as follows:
& #8226; % 341 = 'A'
As before, after the first URL Decoding, % 34 is decoded to number 4, so the object for the second decoding becomes % 41, and the final result is still uppercase.
E. The last four hexadecimal Codes
The last four hexadecimal encoding is exactly the same as the first four hexadecimal encoding, but only the last four digits of the standard decoding are executed. Therefore, the result of uppercase A encoding is:
& #8226; % 4% 31 = 'A'
When decoding for the first time, % 31 is decoded as the number 1, the object for the second decoding is % 41, and the final result is "".
F.UTF-8 Coding
1) Introduction to UTF-8
UTF-8 encoding allows a value greater than a single byte (0-255) to be expressed as a byte stream. The HTTP server uses UTF-8 encoding to represent Unicode code that is greater than the ASCII code range (1-127.
When the UTF-8 works, the high byte has a special meaning. The two-byte UTF-8 and three-byte UTF-8 sequence are represented as follows:
110 xxxxx 10 xxxxxx (two-byte sequence)
1110 xxxx 10 xxxxxx 10 xxxxxx (three-byte sequence)
The first byte of the UTF-8 sequence is the most important, through which you can know the number of bytes of this UTF-8 sequence, which is obtained by checking the number of 1 before the first 0. In the example, the two-byte UTF-8 sequence, the high before 0 has two 1. The first bit after the UTF-8 byte 0 can be used to calculate the final value. The following UTF-8 byte format is the same, the highest bit is 1, the next high is 0, the two are used to identify the UTF-8, the remaining 6 digits are used to calculate the final value.
To encode the URL with a UTF-8, each UTF-8 byte is converted with a percent sign. Example: % C0 % AF = '/'.
2) Unicode code points
You can use UTF-8 encoding to encode Unicode code points. The range of vertices is usually 0-65535, and any vertices greater than 127 in the http url are encoded in UTF-8.
Unicode code points with a value of 0-are mapped to separate ASCII values. In this way, the remaining 65408 values can represent characters in other languages (such as Hungary or Japanese ). Generally, these languages have their own Unicode code pages. You can obtain the Unicode code point value from the Unicode code page. Each Unicode code page has its own unique value. Therefore, if the Unicode code page changes, the Unicode code point value represents different characters. This concept is very important for the URL encoding in the next section.
3) Integrate evasion methods
IDS is hard to handle Unicode code point values in UTF-8 encoding for three reasons:
The first reason is that UTF-8 encoding can represent a code point value or ASCII value in more than one way, which has been corrected in recent Unicode standards, however, it is still common in Web servers (including Apache ).
For example, uppercase letters A can be encoded in A two-byte UTF-8 sequence:
& #8226; % C1 % 81 (11000001 10000001 = 1000001 = 'A ')
Similarly, uppercase letters A can also be encoded in A three-byte UTF-8 sequence:
& #8226; % E0 % 81% 81 (11100000 1