First, create an experiment to create such a PHP file in the local environment.
<? Php
Header ("Content-Type: text/html; Charset = gb2312 ");
Echo $ _ GET ["str"];
Echi "<br/> ";
Echo addslashes ($ _ GET ["str"]);
?>
Here, Magic_quotes_gpc is enabled in my php environment, and the GET method information is also transferred in the Code. Generally
Will process such sensitive characters, it looks safe.
However, if you look at the figure below...
Have you found any problems? The input is % d5, but php does not escape it! Therefore, an injection point is generated !!!
The problem is focused on % d5. What is the problem? The original PHP transfer underlying function php_escape_shell_cmd has a vulnerability,
In php ",,#,&,;..... the special characters in the shell command line can be changed \"., #,&,;...... to avoid the command injection vulnerability. In php, parameters are safe as long as these characters are filtered and sent to functions such as system.
But in fact, for GBK (GB231 can be considered as its subset, GBK extends its support for traditional Chinese) encoding, a Chinese character is processed into two bytes for storage:
The first byte range is 0x81-0xFE.
The last byte range is 0x40-0xFE (excluding 0x7F)
Note that the "" (0x5C) used for escape is exactly included in the last byte! When we submit a 0x81-0xFE byte (0xD5 in this example) and intentionally carry a sensitive character "", php will escape "" to generate a "" (0x5c ), it is exactly combined with 0xd5 to form a complete character "escape", the original Escape Character "anti-water ".
Likewise, this vulnerability is valid for POST and Cookie. I personally think that a better solution is to add a separate method to handle these illegal characters.
Speaking of this, someone will recall that I often use UTF-8 encoding, there is no such problem?
Fortunately, this problem does not exist in Unicode encoding. If you don't want to see the reason, you can skip it later :)
The biggest feature of UTF-8 is that it is a variable length encoding method. It can use 1 ~ The four bytes indicate a symbol, and the length of the byte varies according to different symbols.
UTF-8 coding rules are very simple, only two:
1) for a single-byte symbol, the first byte is set to 0, and the last seven digits are the unicode code of this symbol. Therefore, for English letters, the UTF-8 encoding and ASCII code are the same.
2) for the n-byte symbol (n> 1), the first n bits of the first byte are set to 1, and the n + 1 bits are set to 0, the first two bytes are set to 10. The remaining unmentioned binary bits are all unicode codes of this symbol.
The following table summarizes the encoding rules. The letter x indicates the available encoding bits.
Unicode symbol range | UTF-8 encoding method
(Hexadecimal) | (Binary)
-------------------- + ---------------------------------------------
0000 0000-0000 007F | 0 xxxxxxx
0000 0080-0000 07FF | 110 xxxxx 10 xxxxxx
0000 0800-0000 FFFF | 1110 xxxx 10 xxxxxx 10 xxxxxx
0001 0000-0010 FFFF | 11110xxx 10 xxxxxx 10 xxxxxx 10 xxxxxx
If you are interested, go to the Chinese character unicode encoding table and combine the above table to find that almost all Chinese characters need to be processed in three bytes, the minimum value of the three bytes must also be 1000 0000, that is, 0x80, "" is not included.
It seems that friends with UTF-8 can feel at ease, but in fact php for unicode transcoding but there is a vulnerability. But it is very difficult to use. Interested can find their own XD