Analysis of character encoding and SQL Injection in white-box Auditing

Source: Internet
Author: User
Tags sql error

Although all programs are now called to use unicode encoding, all websites use UTF-8 encoding for a unified international standard. But there are still a lot of cms, including domestic and foreign (especially non-English speaking countries), still use their own country encoding, such as gbk, as their default encoding type.Some cms versions are available in gbk and UTF-8 for consideration of old users. We started with the gbk character encoding demonstration. Gbk is a multi-character encoding, which is defined by Baidu. Note:
Generally, a gbk Encoded chinese character occupies 2 bytes. A UTF-8 encoded Chinese Character occupies three bytes. In php, we can use the output
Echo strlen ("and ");
To test. Output 2 when saving the page Encoding As gbk and 3 when UTF-8.
Except for gbk, all ANSI encoding is two bytes. Ansi is only a standard. It may represent different encodings on computers that are not used. For example, ANSI represents GBK in simplified Chinese systems.
The above is a little bit about multi-byte encoding. We can better analyze its problems only after we have enough knowledge about its composition and features.
After talking so much nonsense, let's take a look at the various problems caused by character encoding in SQL injection.
0× 01 MYSQL wide character Injection
This is an old topic and has been played countless times. However, as the prelude to our article, it is also the foundation and must be mentioned.
We will first build an experimental environment. For the moment, it is called phithon Content Management System v1.0. First, create a new database and import the SQL file in the following compressed package:
Test code and Database: http://pan.baidu.com/s/1eQmUArw extraction password: 75tu
The phithon content management system will gradually improve, but this data table will be used all the time.
The source code is very simple (close your php environment's magic_quotes_gpc first ):
<? Php
// Connect to the database. Use gbk encoding to enter the database information.
$ Conn = mysql_connect ('localhost', 'root', 'toor! @ # $ ') Or die ('bad! ');
Mysql_query ("set names 'gbk '");
Mysql_select_db ('test', $ conn) OR emMsg ("database connection failed, the database you entered is not found ");
// Execute the SQL statement
$ Id = isset ($ _ GET ['id'])? Addslashes ($ _ GET ['id']): 1;
$ SQL = "SELECT * FROM news WHERE tid = '{$ id }'";
$ Result = mysql_query ($ SQL, $ conn) or die (mysql_error (); // an error is reported when an SQL error occurs, which is convenient for observation.
?>
<! DOCTYPE html>
<Html>
<Head>
<Meta charset = "gbk"/>
<Title> News </title>
</Head>
<Body>
<? Php
$ Row = mysql_fetch_array ($ result, MYSQL_ASSOC );
Echo "

Mysql_free_result ($ result );
?>
</Body>
</Html> the SQL statement is SELECT * FROM news WHERE tid = '{$ id}', which extracts the article FROM the news table based on the Article id.
Before this SQL statement, we use an addslashes function to escape the value of $ id. This operation is usually performed on SQL Injection in cms. As long as our input parameters are in single quotes, the single quotes cannot be escaped and cannot be injected. For example:

So how can I escape the limitations of addslashes? As we all know, the effect produced by the addslashes function is to turn 'into \', so that the quotation marks are no longer "single quotation marks. The general bypass method is to find a way to handle the \ 'above \:
1. Find a way to add a \ (or a singular number) to \, so that \ is escaped and 'has escaped the limit
2. Try to get \ missing. Here, the wide byte injection is a feature of mysql. when mysql uses GBK encoding, it will think that the two characters are one Chinese character (the first ascii code must be greater than 128, to the Chinese character range ). What if we enter % df:

We can see that an error has been reported. An error is reported, indicating that the SQL statement is incorrect. If an error is reported, the SQL statement can be injected.
Why is an error reported when a % df is added before ', that is, % 27? We can also see that the error is caused by a single quotation mark while the backslash before the single quotation mark is missing.
This is the feature of mysql, because gbk is multi-byte encoding, he thinks that two bytes represent a Chinese character, so % df and the subsequent \, that is, % 5c, become a Chinese character "running ", and the escape occurs.
Because two bytes represent one Chinese character, we can try "% df % 27 ":

No error is reported. Because % df is a Chinese character and % 5c % 27 is not a Chinese character, it is still \'.
So how does mysql determine whether a character is a Chinese character? According to gbk encoding, the first byte ascii code is greater than 128, basically enough. For example, we can use % a1 instead of % df:

% A1 % 5c may not be a Chinese character, but it will certainly be considered by mysql as a wide character, so that the following % 27 can be escaped.
So I can construct an exp to query the Administrator account and password:

0× 02 difference between GB2312 and GBK
There was a problem that had been bothering me for a long time.
Both gb2312 and gbk are members of the wide byte family. But let's do a small experiment. Modify the set names in the phithon content management system to gb2312:

The result is no injection:

If you do not believe it, you can change the database encoding to gb2312, which is also unsuccessful.
The reason is the value range of gb2312 encoding. Its high position range is 0xA1 ~ 0xF7, the low position range is 0xA1 ~ 0xFE, while \ is 0x5c, which is not in the low position range. Therefore, 0x5c is not encoded in gb2312, so it will not be eaten.
Therefore, we can extend this idea to all multibyte encodings in the world. We can think that as long as the low-level range contains 0x5c encoding, we can inject wide characters.
0× 03 how does mysql_real_escape_string solve the problem?
Some cms have some knowledge about wide-byte injection, so they seek a solution. In the php document, you will find a function, mysql_real_escape_string, which takes into account the current character set connected.

Therefore, some cms replaces addslashes with mysql_real_escape_string to defend against wide character injection. We will continue the experiment. phithon Content Management System v1.2: Uses mysql_real_escape_string to filter the input:

Let's try to see if it can be injected:

There is no pressure injection. Why is it clear that I have used mysql_real_escape_string, but it still cannot resist wide character injection.
The reason is that you have not specified the character set for connecting php to mysql. Before executing the SQL statement, call the mysql_set_charset function and set the character set of the current connection to gbk.

This problem can be avoided:

0× 04 Fix wide character Injection
In 3, we mentioned a solution, that is, first call the mysql_set_charset function to set the character set used for the connection to gbk, and then call mysql_real_escape_string to filter user input.
This method is feasible, but some old cms uses addslashes in multiple places to filter strings. We cannot change addslashes to mysql_real_escape_string one by one. Our second solution is to set character_set_client to binary ).
You only need to specify the join format as binary before all SQL statements:
Mysql_query ("SET character_set_connection = gbk, character_set_results = gbk, character_set_client = binary", $ conn );
What do these variables mean?
When mysql receives the data from the client, it will think that its encoding is character_set_client, and then it will be converted into character_set_connection encoding, and then enter the specific table and field, then convert it to the encoding corresponding to the field.
After the query result is generated, the table and field encoding is converted to character_set_results encoding and returned to the client.
Therefore, if we set character_set_client to binary, there will be no problem of wide bytes or multi-byte. If all data is transmitted in binary format, we can effectively avoid wide character injection.
For example, our phithon content management system v2.0 is updated as follows:

Cannot be injected anymore:

In my audited code, most cms avoid wide character injection in this way. This method can be said to be effective, but if developers add more things, they will give up their previous efforts.
0× 05 IconvFatal consequences
A lot of cms, more than one, I will not mention the name, their gbk versions are all injected due to character encoding. However, some students say that they have tested the wide character injection of these cms, but it has no effect. Isn't it my own posture?
Of course not. In fact, this chapter is no longer about wide character injection, because the problem is not in mysql, but in php.
A lot of cms (which is really a lot, don't believe you can find it on the Internet) will receive data, call such a function and convert its encoding:
Iconv ('Utf-8', 'gbk', $ _ GET ['word']);
The purpose is to avoid garbled characters, especially in the search box.
For example, our phithon Content Management System v3.0

We can see that it sets character_set_client to binary before executing the SQL statement, so it can avoid wide character injection. But then it calls iconv to convert the filtered parameter $ id.
Then let's try whether injection can be made at this time:

An error is reported. It can be injected. I just entered a "token & #039 ;". Why?
Let's analyze it. "Encoding". Its UTF-8 encoding is 0xe98ca6, and its gbk encoding is 0xe55c.
 
Some may understand. \'S ascii code is exactly 5c. Then, when our handler is converted from UTF-8 to gbk by iconv, it is changed to % e5 % 5c, while the following 'is changed to % 5c % 27 by addslashes, in this way, the combination is % e5 % 5c % 5c % 27, and the two % 5c are \. The backslash is escaped, causing 'escape from single quotation marks and injection.
This uses the first of the two methods that bypass addslashes: Escape.
So what if I use iconv to convert gbk to UTF-8?

Let's try:

It was successful again. This time, the wide character injection is used, but the problem lies in php rather than mysql. We know that a gbk Chinese character is 2 bytes and a UTF-8 Chinese character is 3 bytes. If we convert gbk to UTF-8, php will convert every two bytes. Therefore, if the character in front of \ 'is an odd number, it is bound to swallow \,' escape the limit.
So why didn't I use this posture when UTF-8 is converted to gbk?
This is related to UTF-8 rules, UTF-8 encoding rules are very simple, only two:
1) for a single-byte symbol, the first byte is set to 0, and the last seven digits are the unicode code of this symbol. Therefore, for English letters, the UTF-8 encoding and ASCII code are the same.
2) for the n-byte symbol (n> 1), the first n bits of the first byte are set to 1, and the n + 1 bits are set to 0, the first two bytes are set to 10. The remaining unmentioned binary bits are all unicode codes of this symbol. We can see from 2 that for multi-byte symbols, the first two digits of 2nd, 3, and 4 are both 10, that is, \ (0x0000005c) will not appear in UTF-8 encoding, so when UTF-8 is converted to gbk, if \ exists, php will report an error:





However, because gbk encoding contains \, it can still be used in different ways.
All in all, after we have processed the mysql wide character injection, don't think we can rest assured. Be careful when calling iconv to avoid unnecessary troubles.
0× 06 Summary
Nowadays, UTF-8 encoding is a general trend. In terms of security, I also think that using UTF-8 encoding can avoid many bytes.
Not only is it gbk, I just habitually use gbk as a typical example to explain it to you in this article. There are many multi-byte encodings in the world, especially cms in South Korea, Japan and some non-English countries, which may cause security problems caused by character encoding. You should have a scalable thinking.
Summarize the security problems caused by character encoding and their solutions mentioned in the full text:
1. The problem of wide character injection caused by gbk encoding is solved by setting character_set_client = binary.
2. Correct the misunderstanding of mysql_real_escape_string. Calling set name = gbk and mysql_real_escape_string separately cannot avoid wide character injection. You have to call mysql_set_charset to set the character set.
3. Use iconv with caution to convert string encoding, which is prone to problems. As long as we set all front-end html/js/css encoding to gbk and mysql/php encoding to gbk, there will be no garbled problem. You do not need to call the iconv conversion encoding, which causes unnecessary trouble. This article is a small summary of my experience in white box auditing, but I do have a lot of deficiencies. The posture mentioned in this article will inevitably have flaws and errors, I hope that students with the same interests can make progress together with me.
Unlike the previous xss article, this article provides many 0-day examples to demonstrate the hazards caused by wide characters. There are two reasons:
1. The wide character issue is not as common as the rich text xss, and the proportion of cms encoded in gbk is also small. It is strange to me that I am not easy to learn, and I cannot find the corresponding example in each chapter.
‍‍2. The harm of injection is much greater than that of xss. If it is released as 0-day, the impact will be very bad. However, I did find many cms Encoding Problems in my article and the previous audit process. So I wrote a small php file in the form of an experiment and gave it to you as an example. I hope it will not affect your learning performance because of the lack of examples.
For example, you can package and download PHP files and SQL files:
Link: http://pan.baidu.com/s/1eQmUArw extraction password: 75tu
PDF version: Link: http://pan.baidu.com/s/1eprLs password: yoyw
Blog by: http://www.leavesongs.com FROM XDSEC

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.